ebanks
6e855809e1
Renaming and moving relevant tools into a sequenom directory
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2971 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 02:31:10 +00:00
asivache
c638c29eea
In reference traversals, this view did not expect a possibility of TWO alignment contexts (base pileup followed by extended event pileup) associated with the same location. As the result, extended event pileups were silently skipped even when enabled in the traversal engine. Fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2970 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 22:18:44 +00:00
ebanks
bc3761dc16
allow clipper to use original quals if requested
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2969 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 21:50:31 +00:00
ebanks
f096a958d6
Initial commit for Andrey of plumbing for indels. Not finished - need to track down bug with him.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2967 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 19:13:01 +00:00
chartl
0a49dffa8f
Row/Column names are now R-friendly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2966 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 19:01:03 +00:00
ebanks
0e360ea8af
Alleles now hash correctly.
...
Special thanks to Matt & Aaron.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2965 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 18:09:44 +00:00
ebanks
5a20bf0e64
3 changes to UG which break integration tests:
...
1. emit AA,AB,BB likelihoods in the FORMAT field for Mark
2. remove constraint that genotype alleles (in the GT field) need to be lexigraphically sorted.
3. Add bam file(s) used by genotyper to header for Kiran
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2963 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 17:16:47 +00:00
hanna
cdce639bae
Partially reclaim performance lost during integration test fixes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2961 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 12:36:11 +00:00
ebanks
9f3b99c11b
Moving UnifiedGenotyper and VariantAnnotator over to VariantContext system.
...
Removing obsolete genotyping classes.
First stage of removing dependence on old Genotype class.
More changes to come.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2960 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 03:41:07 +00:00
hanna
02f48b6457
Fix bug that's been in the GATK for a very long time: update nReads (as well
...
as nRecords), so that INFO logging doesn't say 'skipped 0 of 0 reads'. While
I'm in there, update TraversalStatistics to store longs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2959 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 22:44:54 +00:00
chartl
21bf8b4b93
Odd, what I saw on IntelliJ hadn't saved to sting before committing. Here's the actual change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2956 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 15:54:41 +00:00
rpoplin
fe8a8b9199
Hooked up both optimization models via command line arguments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2955 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 14:49:59 +00:00
chartl
cc6a714c09
Handle excess coverage in interval output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2954 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 14:40:05 +00:00
rpoplin
ca2a0266dc
Converting annotation values that are set to Double.Infinity
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2953 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 14:04:33 +00:00
rpoplin
b42e0a398e
Bug fix in variant optimizer for when there are more novel variants than known variants in the callset. Changing the magic numbers related to the starting sigma values for the gaussian clusters.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2952 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 13:02:08 +00:00
hanna
e4360bac6a
More comprehensive support when sharding for ref walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2951 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 11:25:20 +00:00
hanna
eb165ca844
Celebrate the fact that the new sharding system works with integration tests
...
by removing the scary debug line.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2950 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-07 23:40:56 +00:00
hanna
9e107513d0
In the new sharding system, if no read group is present, hallucinate one. Added
...
for test compatibility, but not sure whether we still need this feature. TODO: Poll the group about this feature.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2949 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-07 23:01:34 +00:00
hanna
a7fe07c404
A few stopgap fixes to get the GATK to the point where the old sharding
...
infrastructure can be torn down:
1) New sharding system emulates old MonolithicSharding mechanism.
2) Better awareness of differences between fasta and BAM files when creating
shards.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2948 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-07 21:01:25 +00:00
hanna
dd6122f682
Fixed another bug in the original sharding system. Updated integration tests
...
as appropriate.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2947 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-07 15:32:18 +00:00
hanna
ee2ec7ced9
Fix off-by-one error in original implementation of read sharding. Tested by
...
awking output of BamToFastq vs. samtools until the outputs matched exactly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2945 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-06 18:52:53 +00:00
hanna
1ef1091f7c
Cleanup and simplification of read interval sharding.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2944 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 23:34:38 +00:00
ebanks
7fa0f77721
add output for number of variants that validated as true
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2942 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 18:57:44 +00:00
chartl
037ac9c9af
Actually calculate base counts by read group when "both" is specified. Modified integration test to cement the now-correct "both" behavior.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2941 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 18:31:48 +00:00
chartl
8738c544f1
Minor refactoring of CoverageStatistics to allow simultaneous output of per-sample and per-read group statistics.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2940 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 17:06:52 +00:00
rpoplin
95d560aa2f
More incremental updates to the variant optimizer.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2939 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 16:42:42 +00:00
hanna
7a7e85188c
Better eagerDecode default.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2938 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 16:42:23 +00:00
depristo
33cefddf55
Better INFO field annotation for Mendel violations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2937 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 15:22:04 +00:00
ebanks
9f7ebe1e1c
- add name to vcf od field
...
- don't do HW calculation if everything is a no-call
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2936 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 01:43:01 +00:00
hanna
7104a3a96c
Fix for accumulator exception when running reduce by interval walkers without
...
intervals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2935 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 01:04:08 +00:00
ebanks
9eb122924f
misc cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2933 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 21:34:13 +00:00
chartl
706d49d84c
Commit for Aaron
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2932 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 21:29:07 +00:00
ebanks
c20d3e567e
Now outputs fully spec-compliant VCF with proper annotations. Emits statistics as to number of good/bad records.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2931 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 21:28:17 +00:00
hanna
adea38fd5e
Sharding system fixes for corner cases generally related to lack of coverage
...
in the BAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2928 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 18:59:21 +00:00
chartl
a4d494c38b
Add option to adhere to the PlinkRod naming convention [ProjectName]|c[Chrom]_p[Pos]
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2927 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 18:31:27 +00:00
ebanks
0dd65461a1
Various improvements to plink, variant context, and VCF code.
...
We almost completely support indels. Not yet done with plink stuff.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2926 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 17:58:01 +00:00
aaron
c8077b7a22
Waypoint check-in: a couple of changes to for Tribble, and adding some options to the integration test for passing in auxillary files that aren’t “%s” command line options.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2925 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 16:02:21 +00:00
chartl
6759acbdef
Coverage statistics now fully implements DepthOfCoverage functionality, including the ability to print base counts. Minor changes to BaseUtils to support 'N' and 'D' characters. PickSequenomProbes now has the option to not print the whole window as part of the probe name (e.g. you just see PROJECT_NAME|CHR_POS and not PROJECT_NAME|CHR_POS_CHR_PROBESTART-PROBEND). Full integration tests for CoverageStatistics are forthcoming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2924 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 15:00:02 +00:00
hanna
023654696e
First pass at handling SAMFileReaders using a SAMReaderID. This allows us to firewall
...
GATK users from the readers, which they could abuse in ways that could destabilize the GATK.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2923 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 00:59:32 +00:00
rpoplin
b241e0915b
Incremental update to VariantOptimizer. Refactored parts of the clustering code to make it more clear. More comments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2922 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 20:33:35 +00:00
asivache
073fdd8ec7
Let's try not to die suffocating when a bad region with humongous coverage is encountered. New option: -maxNumberOfReads (--mnr), with default of 10,000. If count of reads cached in the current window reaches the specified limit, the whole window is immediately shifted by the whole window length and all currently cached reads are dropped. NOTE: this also means that we are not going to call ANY indels from the current window, even though we could try using just the reads cached so far.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2921 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 17:34:30 +00:00
chartl
6ca6c98980
Can just give PickSequenomProbes a dbsnp rod to mask
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2920 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 16:50:58 +00:00
aaron
ca2cd9d4f5
a little clean-up: move setting the bases of generated reads into Artificial SAM Utils now that the clean read injector test is gone.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2919 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 16:31:45 +00:00
aaron
790d2a7776
adding the initial ROD for Reads support; more convenience methods in ReadMetaDataTracker to come.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2918 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 15:56:44 +00:00
ebanks
0e9a6826b0
Update to VCF code to get it up to spec.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2917 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 06:12:42 +00:00
ebanks
317fac8dff
Better error message for --assume_single_sample_reads screw up
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2916 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 01:03:10 +00:00
hanna
104f4f7383
Mediocre implementation of reader pooling within the SAM data source. Will fix this week.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2915 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 22:35:02 +00:00
ebanks
5f3c80d9aa
1. To make indel calls, we need to get rid of the SNP-centricity of our code. First step is to have the reference be a String, not a char in the Genotype. Note that this is just a temporary patch until the genotype code is ported over to use VariantContext.
...
2. Significant refactoring of Plink code to work in the rods and use VariantContext. More coming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2913 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 20:26:40 +00:00
ebanks
6ceae22793
utility methods for genotype counts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2912 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 20:23:41 +00:00
kcibul
7578678f99
refactored to provide a sum of mismatch quality scores capability as well (used by Cancer)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2911 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 16:40:03 +00:00
aaron
232fcf829a
removing the unsupported VCF validator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2909 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 15:45:33 +00:00
hanna
1b572b192a
Stopgap fix for temporary problems sharding when indexless. A more compelling solution will come later this week.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2908 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-02 02:59:14 +00:00
hanna
75a541b479
Fix nasty issue where shard boundaries aren't properly clipped during locus traversals.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2907 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 23:31:58 +00:00
rpoplin
af6e476df5
Copyright compliant
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2905 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 15:29:34 +00:00
rpoplin
3a863d3e8c
Initial check in of VariantOptimizer in playground. There is a Gaussian Mixture Model version and a k-Nearest Neighbors version. There is still lots of work to do. Nobody should be using it yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2904 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 15:26:18 +00:00
hanna
6133d73bf0
Locus (non-intervalled) traversal with new sharding system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2903 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-01 01:58:44 +00:00
hanna
80f5d2829d
Support for read interval sharding with proper filtering.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2902 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-27 20:26:34 +00:00
aaron
d8fedd59be
docs, cleanup, and some improvements to the iterators.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2901 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 22:36:04 +00:00
hanna
b69c2d0f70
Cleanup. Remove some unnecessary methods.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2900 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 21:50:48 +00:00
hanna
30eb28886b
Basic functionality for intervaled reads in new sharding system. Not currently filtering out cruft, so
...
the mode of operation is currently queryOverlapping rather than queryContained.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2899 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 21:41:55 +00:00
chartl
cfff486338
This commit is for Kiran
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2898 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 18:18:38 +00:00
chartl
87f8fb7282
Quick commit in advance of Aaron's. Just a bunch of refactoring (private classes separated out, put in proper package). Also support added for coverage by read group rather than sample.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2897 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 16:39:47 +00:00
aaron
622554d7bd
disable a part of the ROD for Reads code until the rest of the system goes live
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2896 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 16:15:42 +00:00
chartl
496ecc8186
Change in how overall coverage and means are stored in the DOCS object; change from keeping track of sample mean coverage to keeping track of sample total coverage (calculate means at the end)
...
This is a mid-way commit for Aaron
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2895 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 15:51:12 +00:00
hanna
1017a38f38
Initial refactoring of read traversal to make it easier to drop in intervalled reads traversal.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2894 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 15:09:09 +00:00
depristo
9a6b384adb
Support for no qual fields in VCF; better support for Mendelian violation calculations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2893 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 00:29:17 +00:00
aaron
246fa28386
RODs for reads phase 2: modified RODRecordList to implement List<ReferenceOrderedDatum> so I could stub it out for testing, added a FlashBackIterator which is needed to prevent the ResourcePool from opening infinity+1 iterators, and some other interfaces to make unit testing much smoother.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2892 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 22:48:55 +00:00
chartl
591102a841
Don't close the output stream if we're printing to stdout
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2891 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 21:50:58 +00:00
chartl
10cc71ceb0
Another midway commit for teh engineerz
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2890 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 21:24:02 +00:00
hanna
3289826892
Fix chartl's issue -- reduceInit() is sometimes called unnecessarily at the
...
end of a traversal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2889 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 21:02:18 +00:00
chartl
3d92e5a737
Initial commit of integration test(s) for CoverageStatistics, currently in progress [midway commit is for Matt]
...
Modifications to CoverageStatistics - now includes and extends much of the behavior of DepthOfCoverage (per-base output, per-target output).
Additional functionality (coverage without deletions, base counts, by read group instead of by sample) is upcoming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2888 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 20:25:07 +00:00
hanna
553d39bb00
Clean up the code a bit following the introduction of reduceByInterval.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2887 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 01:20:22 +00:00
hanna
199b43fcf2
Reduce by interval alterations to interface with new sharding system. This checkin with be followed by a
...
simplification of some of the locus traversal code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2886 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 00:16:50 +00:00
asivache
2572c24935
We were still dropping halves of some pairs, in which both reads were assigned to the same position. Fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2885 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 23:13:23 +00:00
aaron
fef1154fc8
starting on RODs for Reads: made RODRecordList implement list<RODatum> (so we can sub in fake lists during testing), and removed unnecessary generic-ness. Removed BrokenRODSimulator, which isn't being used.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2884 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 22:11:53 +00:00
chartl
5df37968de
Simplification of code segments; slight alteration to per-locus tabulation; added to-do items for cosmetic changes (mostly binning options and settigns)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2882 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 05:20:18 +00:00
asivache
27d3ef9458
Got rid of annoying commented printouts; no functional changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2881 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 05:12:30 +00:00
asivache
d73bc490c2
Do not build alt consensuses from insertions that have an N in the inserted sequence. Seems to cause problems rather than solve any
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2880 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 03:00:26 +00:00
asivache
94d74d4f78
Multiple instances of the same consensus were all living happily together in the set of alt consensuses. As the result, we have been taking considerable performance hit from trying to align all reads to those instances over and over again. Fixed. Only one copy of any given alt consensus is now stored.
...
in class Consensus:
1) use Arrays.equals() to compare java arrays!!
2) if object overrides equals() it also MUST provide appropriate hashCode() (thanks, Matt)
As a side effect, a number of commented out debug prints are committed, still need them...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2879 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 02:09:50 +00:00
chartl
1f673e9fab
Float the bins with the given lower bound
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2878 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 20:48:53 +00:00
chartl
119d449b46
Formatting changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2877 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 20:43:15 +00:00
chartl
173956927b
Summaries generated for firehose from DoC output have been migrated to its own walker to calculate aggregate coverage statistics in a parallelizable and fast way. This is an initial commit, bug-fixing and testing is upcoming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2876 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 18:41:02 +00:00
hanna
491b30e8de
Eliminate a few stray loci that weren't being filtered out.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2875 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 18:00:52 +00:00
hanna
fff15944fe
Bug fix. Stopping condition of recurrence stopped too soon in some cases where an interval *contained* zero reads but *overlapped* with some reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2874 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 15:58:54 +00:00
hanna
a0e8de40cf
Bug fix: at one locus in the dataset, two reads were dropped.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2872 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 23:54:52 +00:00
aaron
5546aa4416
adding code to deal with the off-spec situation where our minimum likelihood is above the GLF max of 255.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2871 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 22:27:39 +00:00
hanna
88d0677379
Misc correctness enhancements: develop the bin selector into a recursive algorithm and return a shard when reads are missing. Also improve the performance of the read filter that clips reads not actually present in the shard.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2870 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 22:19:06 +00:00
ebanks
8b555ff17c
Killed the old cleaner code. Bye bye.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2868 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 20:49:58 +00:00
kshakir
3738b76320
Added a playground concordance analyzer for summarizing VariantEval across a group.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2867 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 20:28:52 +00:00
ebanks
a640bd2d79
ignore uninteresting extended events
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2866 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 19:55:46 +00:00
rpoplin
32e5dceef9
Moving comments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2865 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 19:27:31 +00:00
alecw
b236714c8a
Optimization - Added method to Covariates: void getValues( SAMRecord read, Comparable[] comparable ) which takes an array of size (at least) read.getReadLength() and fills it with covariate values for all positions in the given read. Made CovariateCounterWalker and TableRecalibrationWalker use this method instead of calling getValue(..) for each covariate and each offset.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2863 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 17:35:25 +00:00
ebanks
32d14d988e
Overload parseIntervalRegion() to allow for the interval merging rule to be passed in (so one is not required to use the value from the GATK arg collection).
...
Now the IndelRealigner can use this functionality without being forced to merge abutting intervals (which was actually causing a problem with the cleaning).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2862 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-22 04:13:54 +00:00
hanna
cc09f48cd8
Correctness fix: index can concat chunks around shard edges, and my code didn't account for that.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2861 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 21:44:33 +00:00
chartl
0e05a3acb0
Adding depth of coverage features to firehose summary tools
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2860 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 19:47:16 +00:00
hanna
71f18e941f
Significant performance improvements made by subtracting out the contents of the prior highest-level bin.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2859 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 16:46:16 +00:00
rpoplin
7f19ff1fa1
Added a new option in the recalibrator to be used by people who have SOLiD data in which only a few of the reads have no-calls in the color space. These reads will be skipped over and left in the bam file untouched.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2857 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 15:25:23 +00:00
aaron
b1a4e6d840
removing non-ascii characters from my Copyright and from VariantEval2Walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2856 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-18 18:54:36 +00:00
aaron
33ae256186
a start to some of the infrastructure for Tribble, including dynamic detection of new RMD; not nearly wired in or complete yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2855 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-18 18:43:52 +00:00
ebanks
bbbad79f8c
Forgot to remove debugging code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2854 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-18 18:12:58 +00:00
ebanks
7669eaaeb3
Optimizations to the cleaner algorithm; reduce total runtime by almost 20%.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2852 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-18 18:10:56 +00:00
ebanks
79ab7affda
- Change sortOnDisk option to sortInMemory
...
- Fix horrible cleaner bug
- Trivial optimizations to cleaner code - more significant ones coming soon.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2850 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-17 20:52:57 +00:00
ebanks
2520889cb3
Check for bad intervals and don't emit them
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2849 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-16 21:42:36 +00:00
aaron
653f70efa2
added methods to validate an interval before you try to make a GenomeLoc: boolean validGenomeLoc().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2846 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-16 20:35:35 +00:00
chartl
01af3d0663
Update an error message :)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2842 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-15 23:24:06 +00:00
jmaguire
81313d9452
added class VCFMerge
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2840 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-15 14:41:50 +00:00
jmaguire
0ef50bcae7
- update to match recent changes in the VCF parser
...
- compute Het Error Rate in VCFConcordance
- changes to the frequency-specific optimizer
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2839 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-15 14:27:01 +00:00
depristo
8072e9aed5
should never commit without running intergration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2838 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 23:42:37 +00:00
depristo
a1a3d5fcb0
Support for reading in table of rsIDs -> dbSNP builds to back generate a dbSNP build X from a single file. Very useful indeed. dbSNP -> VC now captures the rsID in the context
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2837 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 22:40:55 +00:00
kcibul
28f24ca2ae
made some private member/methods protected to allow for subclassing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2836 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 21:16:00 +00:00
hanna
232d884578
Got back most of the performance lost when I fixed the dropped reads problem.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2835 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 19:59:56 +00:00
chartl
04a2784bf7
Initial commit of tools under development for data QC through firehose.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2834 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 19:13:24 +00:00
hanna
77af5822d4
Correcting my incomplete understanding of how the BAM file index actually works.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2833 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 16:15:19 +00:00
depristo
5f74fffa02
Massive improvements to VE2 infrastructure. Now supports VCF writing of interesting sites; multiple comp and eval tracks. Eric will be taking it over and expanding functionality over the next few weeks until it's ready to replace VE1
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2832 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 15:26:52 +00:00
ebanks
c6f6948f9d
Haiku:
...
Eric is a fool.
Matt found his really dumb bug.
Eric is humbled.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2830 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 04:51:56 +00:00
rpoplin
ecebf0bc62
Bug fix for null pointer exception in AnalyzeAnnotations if -name argument isn't specified
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2828 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-11 18:39:26 +00:00
mmelgar
ad608d0e9d
Cleaned up documentation on SecondaryBaseTransitionTableWalker and added Read Group and Allele Balance to the info.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2827 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-11 17:20:35 +00:00
hanna
34e566c90d
Fixed bug where new sharding system wasn't grabbing the reads that start at the end of a bin. Caused by what I currently believe to be a bug in Picard -- will verify with Alec.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2826 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-11 17:00:04 +00:00
ebanks
96fee7cf7a
Disabling input of known indels for use as alternate consenses. When we get rods in a read traversal, it will be trivial to hook it into the cleaner (the code is already there).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2825 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-11 15:52:21 +00:00
ebanks
a4a2c9b172
Deal with bad input; also N-way out isn't default.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2823 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-11 03:44:56 +00:00
hanna
dc885ba386
Fix for some correctness bugs found during early performance testing, phase 1.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2822 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-10 22:32:25 +00:00
depristo
c66861746a
improvements to ve2, including more meaningful mendelian violation counting. Support for VCF emitted interesting sites, annotated according to the evaluations themselves. Basic intergration test for VE2 started
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2819 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-10 16:12:29 +00:00
rpoplin
3de72daa88
Removing an accidently added import statement.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2818 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-10 15:54:24 +00:00
rpoplin
0b1e243a7b
CountCovariates now sorts the list of standard covariate classes coming from PackageUtils.getClassesImplementingInterface(). As a result some of the integration tests now make use of -standard
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2817 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-10 15:52:20 +00:00
ebanks
6652b992f7
The new cleaner can now use known indels to create alternate consenses for cleaning.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2816 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-10 04:39:15 +00:00
hanna
0250338ce7
Basic use cases for merging BAM files with the new sharding system work.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2815 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-09 22:14:37 +00:00
depristo
934d4b93a2
VariantContext to VCF converter. BeagleROD, and phasing of VCF calls. Integration tests galore :-)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2814 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-09 19:02:25 +00:00
andrewk
369cc50802
Added playground walker that does a basic concordance check between two VCF files - an eval and a truth file - across all samples in the eval file. Produces per-sample, per-locus debug info and simple concordance stats. This is not meant to be extended, but rather used for validating the HapMap to VCF conversion in preparation for retiring GFF-based HapMap data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2813 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-09 02:41:18 +00:00
depristo
94f892ad42
VCF->beagle and VCF phasing using beagle input. Appears to work fairly well. VariantContexts now support phased genotypes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2812 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-09 01:22:05 +00:00
depristo
457568485a
simple Beagle input ROD
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2811 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-09 01:21:04 +00:00
hanna
57b8c9a53c
Supporting infrastructure for merging SAM files. Not yet integrated into the datasource.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2810 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-08 23:59:38 +00:00
kshakir
fc810a1800
Updated VCF Reader to parse VCFs according to the VCFv3.3 spec. Column headers are tab separated since sample names might have spaces.
...
Updated test files in /humgen/gsa-scr1/GATK_Data/Validation_Data/*.vcf to remove spaces except for when they are supposed to be in the sample name.
Added @Test before VCFReaderTest.testHeaderNoRecords()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2809 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-08 22:55:59 +00:00
chartl
935e76daa1
Minor changes to oneoff walkers. PlinkRod altered but still commented.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2808 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-08 18:49:56 +00:00
hanna
21369869b7
Extend regex that supports every 'word' character to use any printable character except ':'.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2807 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-08 03:29:55 +00:00
ebanks
4fe851a83d
Optimization: don't keep scoring an alternate consensus if it's already worse than the best alt seen so far.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2806 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-07 05:06:32 +00:00
ebanks
ca1917507f
Various improvements and fixes:
...
In indel cleaner:
1. allow the user to specify that he wants to use Picardâs SAMFileWriter sorting on disk instead of having us sort in memory; this is useful if the input consists of long reads.
2. for N-way-out mode: output bams now use the original headers from the corresponding input bams - as opposed to the merged header. This entailed some reworking of the datasources code.
3. intermediate check-in of code that allows user to input known indels to be used as alternate consenses. Not done yet.
In UG: fix bug in beagle output for Jared.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2805 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-07 04:21:04 +00:00
depristo
3b1ab86d11
Added generic interfaces to RefMetaDataTracker to obtain VariantContext objects. More docs. Integration tests for VariantContexts using dbSNP and VCF. At this stage if you use dbSNP or VCF files only in your walkers, please move them over to the VariantContext, it's just nicer. If you've got RODs that implemented the old variation/genotype interfaces, and you want them to work in new walkers, please add an adaptor to VariantContextAdaptors in refdata package. It should be easy and will reduce burden in the long term when those interfaces are retired.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2803 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-06 16:26:06 +00:00
depristo
995d55da81
now uses the new RMDT getVariantContext() functions instead of doing the work itself.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2802 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-06 16:23:06 +00:00
depristo
33760834d6
commented out inactive (due to string ==) but actually incorrect code. Sometimes two wrongs do make a right
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2801 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-06 16:22:26 +00:00
hanna
c7e006a996
Bug fixes for interval batching in sharding system. Sharding system now batches intervals and passes
...
basic tests for small and large intervals and intervals that cross bin boundaries. Currently works
only with a single BAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2800 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 21:47:54 +00:00
asivache
a1d5a384f4
Reverting the last reversal. bestConsensus points to something also kept in a set, so just reassigning it will NOT automatically destroy the underlying data; explicit clearing of unneeded data reinstated. STUPIDO!!!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2796 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 18:08:53 +00:00
asivache
cf7e6d0c0b
Memory-saving change, same as in old IntervalCleaner (if alt consensus does not beat the best one, destroy its data immediately)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2795 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 18:05:04 +00:00
asivache
df0be25afb
ooops, no need to destroy old best's data explicitly, it will be done automatically of course
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2794 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 18:03:16 +00:00
asivache
9f44018b7d
Reducing memory footprint: if alt consensus does not beat the best alt observed so far, destroy its data immediately, instead of keeping them around. If new alt is better than the old best, then destroy the old best right away instead.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2793 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 17:58:54 +00:00
rpoplin
be33d1852c
Reverting
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2792 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 15:57:09 +00:00
depristo
af8c47fc2f
Fixing up testVariantContext for integration tests for variant context. Printing of VCs and genotypes now stable using sorting. Cleaned up comments in quality score by strand. RefMetaDataTracker now directly allows walkers to obtain VariantContexts using the simple Collection<VariantContext> getAllVariantContexts(GenomeLoc curLocation, EnumSet<VariantContext.Type> allowedTypes, boolean requireStartHere, boolean takeFirstOnly) function. VCF and dbSNP VariantContexts now officially supported. Other importan types can be added to the adapator system in refdata package. Integration tests later today
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2791 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 15:42:54 +00:00
rpoplin
0d8d6e0a14
Ti/Tv module in VariantEval shows known and novel ratios if possible
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2790 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 15:37:40 +00:00
depristo
c6d86da4b8
almost managed to move things around perfectly in move go
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2788 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 14:18:26 +00:00
depristo
e0af3bf761
updating back names
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2786 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:53:45 +00:00
depristo
777617b6c7
managed to actually move the files too! Damn you svn
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2785 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:47:19 +00:00
depristo
8938a4146d
moving varianteval2 to it's own dir
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2784 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:37:04 +00:00
depristo
69132c81aa
Documentation. Plus nicer structure to adaptors. Intermediate checkin before move into core
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2783 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:33:27 +00:00
hanna
e53432d54d
Checkpoint for combining adjacent intervals into the same shard.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2782 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 02:48:02 +00:00
asivache
0d347d662a
More plumbing: if after the shift window contains indel(s) at the first position, do not throw an exception, just print the warning (we can not deal with this situation!!) and discard those indels without trying to call them. This situation will most probably arise after forced shift over a messy region anyway.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2781 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 21:06:28 +00:00
depristo
1d86dd7fd1
Interface changes following Matt's advice. VariantContexts are now immutable, and there are special mutable versions, in case you need to change things. AttributedObject now a InferredGeneticContext and package protected. VariantContexts are now named, which makes them easier to use with the rod system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2780 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 20:55:49 +00:00
asivache
e7b710791f
OK, we finally ran into a messy dataset where we can not find a place to shift the window to: there's an indel at every position. Don't panick, don't throw an exception, just ignore the whole window completely, we do not want to call there.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2779 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 19:49:56 +00:00
asivache
152f65b362
Do not die in --cycleOnly mode when the lane is not paired end, just count all single end basequals into the first column and leave the second column filled with 0s
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2778 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 19:48:12 +00:00
asivache
a3cd56897d
moving older versions of the oneoff project to archive, bye-bye
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2777 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 19:46:27 +00:00
asivache
f7e7bcd2ef
Oneoff project, totally unrelated to anything
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2776 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 19:44:50 +00:00
hanna
334da80e8b
Fixed Mark's bad checkin.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2775 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 12:40:58 +00:00
depristo
1ce0f06216
temp checkin for reorganization
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2774 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 11:10:24 +00:00
ebanks
83b9d63d59
1. Added functionality to the data sources to allow engine to get mapping from input files to (merged) read group ids from those files.
...
2. Used said mapping to implement N-way-in,N-way-out functionality in the new indel cleaner. Still needs more testing (to be done after vacation but preliminary tests look good).
3. Fixes to VCF validator: ignore case when testing VCF reference base against true reference base and allow quals of -1 (as per spec).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2773 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 04:12:49 +00:00
rpoplin
210c4c9913
AnalyzeAnnotations now makes plots for the value in the QUAL column as if it were an annotation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2771 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-03 20:33:15 +00:00
hanna
3f35e181d5
Add an alternate implementation of the BAM file reader that keeps the entire index in memory. Initial revision of BAMFileStat, a tool to inspect BAM file BGZF blocks and index entries.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2769 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-03 19:48:15 +00:00
depristo
c89ba7b1a4
improvements to variant eval 2. Now has titv calculations and mendelian violation detect support. we only make ~80 mendelian violations in 380K calls for the YRI trio, in case you are interested
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2768 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-03 16:03:19 +00:00
depristo
fa2cd432fd
better printing in VE2. Added support for TiTv analysis
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2766 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-02 21:20:29 +00:00
depristo
cbbc0e98d2
fix for broken imports
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2765 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-02 15:20:27 +00:00
depristo
681c196097
V2 of VariantEval2. Framework is essentially complete., very simple and clear now compared to VE1. Support for any number of JEXL expressions. dbSNP% evaluation added to show paired comparison evaluation. Pretty printing output tables. Performance is poor but can easily be fixed (see todo notes).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2764 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-02 14:18:46 +00:00
hanna
9dbdfff786
Moved VariantEval to core. Updated integration test md5s to reflect new Analysis class names.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2762 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-02 00:22:15 +00:00
asivache
4ddbaeed07
In attempt to reuse: --pairCountsOutput is now optional, if not specified then only per-locus statistics is collected; --silent - do not echo results into stdout; --minMapQ - count only bases coming from reads mapped with specified quality or better; --blacklistedlanes - do not count reads/bases coming from specific lanes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2761 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 22:05:19 +00:00
chartl
2c4f709f6f
Bunch of oneoff stuff that I don't want to lose. Also:
...
VCFRecord - "." dbsnp-ID entries now taken into account (thought these were represented as null; but I guess not)
VCFGenotypeRecord - added a replaceFormat option; since intersecting Broad/BC call sets required genotype formats also be intersected (no changing on-the-fly)
VCFCombine - altered doc to instruct user to give complete priority list (was throwing exception if not)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2760 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 21:35:10 +00:00
asivache
421282cfa3
Convenience method: getMappingFilteredPileup(int minMapQ)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2759 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 21:19:53 +00:00
ebanks
506d39f751
The UG calculations are now driven by an independent engine.
...
This completely separates the genotyper walker from other walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2758 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 20:57:31 +00:00
hanna
d8e75cf631
Fix for Kiran's memory issue running UG...turned out to be a particularly bad interaction between @By(Reference) traversals and TreeReduce.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2757 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 20:27:06 +00:00
depristo
d9671dffba
Documentation for VariantContext. Please read it and start using it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2756 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 17:49:51 +00:00
asivache
990af3f76e
Will now work with simplest tabular format - genotype string ("+ACTT") does not have to be followed by ':'
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2755 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 15:40:01 +00:00
ebanks
e0808e6c37
Moved old EM model to archive
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2754 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 02:55:32 +00:00
rpoplin
64fc76e4bf
Added an option to AnalyzeCovariates to set the max value of the histograms to make them easier to directly compare.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2753 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-31 23:13:57 +00:00
ebanks
f6da57dc79
1. For Matt: JIRA GSA-270. Other walkers needing to call into the Unified Genotyper now use static methods (e.g. runGenotyper()) instead of calling initialize and map.
...
2. Set the default confidence cutoff to 50 (instead of 0).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2752 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-31 21:14:57 +00:00
ebanks
ce9d3dcefb
Removing deprecated version of indel genotyper (putting it in archive in case we need to reproduce original 1KG indel calls for some reason).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2749 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-31 14:05:36 +00:00
depristo
3d45457595
VariantEval2 test framework implemented; Kiran is experimenting with the system. Not for use by anyone else. VariantContext appears to work well; I'll release it next week for general use following docs of the functions. Removing newvarianteval and other classes to avoid any future confusion. Update to TraverseLoci and RodLocusView to simplify a few functions and to correct some minor errors. All tests pass without modification.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2748 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-30 20:51:24 +00:00
chartl
236764b249
Major (and useful) changes to MultiSampleConcordance:
...
1) Now cares about Genotype filtering. If it is flagged as filtered, it can count as a FP/FN/TP; but goes into a "non-confident genotype" bin, rather than het/hom.
2) Can give it a Genotype Confidence flag (-GC) which will automatically filter genotypes in the way above for quality > Q for "-GC Q"
3) Can give it an -assumeRef flag. For sites only in the truth VCF (that don't even appear in the variant VCF), that locus will be treated as confident
ref calls for all individuals in the variant VCF; and the calculators updated accordingly.
*** Important: Default behavior is that sites unique to the truth VCF are considered no-call sites for the variant. This flag can help get aroudn that;
however the safest way to run this is to have a variant VCF with calls at each and every locus, if that is possible.
VCFGenotypeRecord -- added an isFiltered() call to automate looking up the FILTERED flag for VCF v3.3
SimpleVCFIntersectWalker - basic outline for a walker I'm working on tonight.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2747 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-30 01:18:31 +00:00
jmaguire
ea7e737441
Two new annotations:
...
1. LowMQ: fraction of reads at MQ=0 or MQ<=10.
2. Alignability: annotate SNPs with Heng's (or anyone else's) alignability mask.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2746 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 23:23:00 +00:00
chartl
97f60dbc4b
Moving stuff around. ( core;playground ) ----> ( oneoffs ). I've been a bad boy, sullying the core codebase.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2745 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 22:50:03 +00:00
rpoplin
16da5011c0
Added a new option for indicating the mean number of variants on the AnalyzeAnnotations plots. This way one can say, for example, filtering at this point will keep 75 percent of all the variants.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2744 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 21:58:31 +00:00
hanna
668c7da33d
Bug fix in custom override of queryOverlapping.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2743 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 21:35:59 +00:00
rpoplin
c6cc844e55
Added -name argument to AnalyzeAnnotations that allows one to specify the name of the annotation to be used on the plots. Instead of seeing AB and DP, one can add -name AB,AlleleBalance -name DP,Depth
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2742 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 20:48:53 +00:00
depristo
62a80f2b6f
fixed out of date tests. Also, tests uncovered a subtle bug in new implementation that was also fixed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2741 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 20:03:48 +00:00
rpoplin
4f29a1d4f6
AnalyzeAnnotations now plots true positive rate instead of percentage of variants found in the truth set. Committing GCContentCovariate to help people experiment with correcting the pilot3/Kristian base calling error mode in slx.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2740 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 20:01:56 +00:00
aaron
ac2a207b0b
added a wrapper exception for anything that goes wrong in VCF parsing; this way the problematic file line is emitted, no matter what happens. Makes debugging a lot easier, especially in large files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2739 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 19:58:51 +00:00
hanna
e7f5c93fe5
Cleaning up the inheritance hierarchy from the previous commit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2738 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 19:13:36 +00:00
depristo
88495a39d4
better formating
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2737 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 15:38:21 +00:00
depristo
1993472b38
Just like VariantFiltration but lets you match info fields out of the VCF instead of annotating them.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2736 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 15:38:03 +00:00
depristo
0a7426c29c
Computes SNP density over the genome. Doesn't work with intervals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2735 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 15:36:49 +00:00
depristo
9decd20f46
Fix to priors to allow lower het values for mouse guys; no intergration test changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2734 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 15:36:12 +00:00
chartl
d57a86ad41
Not nearly as badass as it looks. The problem I mentioned yesterday with "bleeding in" of samples comes from VCFUtils and SampleUtils looking for all VCF-class RODs in the tracker, and stealing the name from them. I have introduced a new HapmapVCF - type rod for use
...
when you want to protect your VCF header from being infected by the samples in a bound hapmap VCF. Changes are as follows:
VCFRecord - minor change to adapt isNovel() to the case where the dbsnp ID field is empty, but the info field has DB=1
HapmapVCFRod - introduced for the reason at the top
RODRecordIterator - was: catch ( Exception e ) { throw new StingException("long ass message") }
is now: catch ( Exception e ) { throw new StingException("long ass message",e) }
to permit full stack ejaculation.
RodVCF - Now with more brackets!
ReferenceOrderedData - registering HapmapVCF as a bindable string
VariantAnnotator - There's an extra space on a line. And some new brackets.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2733 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 15:19:50 +00:00
depristo
5aaf4e6434
VariantFiltration now accepts any number of --name --filter expressions, and annotates the VCF file with each name that matches. Very useful
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2732 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 12:13:08 +00:00
ebanks
01e73fc39e
Yuck - Picard's SAMRecord Comparator only deals with mapped reads. Adding an extended version that works for all reads.
...
After adding some more minor changes to the new realigner it now gets the same exact results as the original version - except that sometimes it doesn't clean when it shouldn't!
More testing coming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2731 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 07:49:47 +00:00
hanna
3d922a019f
Basic support for very simple index-driven locus traversals. Interface has been changed to
...
support batched intervals in a single shard, but intervals are not yet compressed into a single
shard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2730 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 03:14:26 +00:00
asivache
4810e9c9cd
And now the DOCS!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2729 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 23:21:33 +00:00
asivache
40262e2070
Now calls single-sample indels too, with all the V2 level stats and bells. This officialy obsoletes IndelGenotyperWalker (V1). In addition, the alignments spanning beyond the contig end are now completely ignored (with a user warning), this applies to both single-sample and paired (somatic) calls. You just wait, Eric, I'll get you the docs with the next commit!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2728 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 22:28:02 +00:00
rpoplin
79c4cc1db7
AnalyzeAnnotations now breaks out titv by calls in hapmap and also plots true positive rates. Any RODs passed in whose name starts with 'truth' is considered to be the truth set.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2726 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 21:41:23 +00:00
chartl
7a10c40fb3
Much clearer (and, like, not totally incorrect) implementation of isNovel
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2725 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 21:16:21 +00:00
chartl
8de6a8d246
Lots of changes; all to do something relatively minor.
...
1) Changed VCF/RodVCF to allow for inquiries to whether or not the site is novel; isNovel() looks at the ID field, and those members of the info field that indicate membership in dbsnp, hapmap2, or hapmap3; and if none can be found, returns true.
2) Changed VariantAnnotator to annotate hapmap2 and hapmap3, if you bind rods to it with those names. Works in the same way as DBSNP does -- if you give it a rod named "hapmap2" it'll annotate membership in it. -- Passes integration tests
3) Changed UnifiedGenotyper to do the same thing (since it uses Annotations as a subroutine) -- Passes integration tests
4) Changed MultiSampleConcordanceWalker to take a flag --ignoreKnownSites (or -novels) to examine concordance only on sites that are not marked as in dbSNP or in Hapmap in the variant VCF
5) Changed VCFConcordanceCalculator (the object MultiSampleConcordanceWalker runs on) to output Concordant_Het_Calls and Concordant_Hom_Calls separately, rather than combined as Concordant_Calls
6) AlleleBalanceHistogramWalker -- I don't know what i did to this thing. I've been jerry rigging System.outs to do stuff it was never really intended to do; so there's probably some dumb System.out.print("HI I AM AT LOCUS:"+loc) stuck somewhere. It compiles at any rate.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2724 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 21:06:56 +00:00
ebanks
6f11fe442a
Sync with Andrey's changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2723 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 20:49:38 +00:00
asivache
db429e1096
Some alt consenses may have cigar string starting with an insertion. Not a bug, strictly speaking, since the cleaner had been detecting this and crashing deliberately. Now it knows how to deal with this special case though. Also, uppercase the ref before using it in SW aligner!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2722 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 18:53:02 +00:00
depristo
956b570c8e
V5 improvements to VariantContext. Now fully supports genotypes. Filtering enabled. Significant tests throughout system. Support for rebuilding variant contexts from subsets of genotypes. Some code cleanup around repository
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2721 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 18:37:17 +00:00
depristo
9876645a5d
Now drives the walker by reference, not by reads, so we see even loci with no reads. This allows us to accurately calculate the true total callable area
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2720 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 11:12:46 +00:00
ebanks
1dd9996f3a
New realigner now completely uses bytes, plus misc fixes. Still not ready for use.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2719 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 04:17:20 +00:00
depristo
f6bca7873c
V3 of VariantContext. Support for Genotypes and NO_CALL alleles. QUAL fields fully implemented. Can parse VCF records and dbSNP. More complete validation. Detailed testing routines for VariantContext and Allele.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2718 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 04:10:16 +00:00
chartl
23fc9737b4
Added the ability to filter out variant (not truth) calls based on read depth. Using -NLD 5 will not update concordant counts for calls with 0, 1, 2, 3, or 4 reads supporting them. Not to be used with VCF files that do not have DP in the format field.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2716 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 23:28:04 +00:00
chartl
1b9184a1c7
Added a multisample concordance walker which takes the place of the VCF python library I've been using. Takes a truth VCF and a variant VCF and outputs A TSV that looks like this:
...
Sample_ID Concordant_Refs Concordant_Vars Homs_called_het Het_called_homs False_Positives False_Negatives_Due_To_Ref_Call False_Negatives_Due_To_No_Call
NA19381 491 294 2 0 0 0 1
NA19451 489 298 1 0 0 0 0
NA19463 486 289 2 3 1 4 3
NA19376 488 296 1 0 2 0 1
NA19317 489 284 5 3 3 3 1
This walker will be merged with GenotypeConcordance once it's clear how to do so.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2715 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 22:59:17 +00:00
asivache
bd11060e72
Ups, I did it again. Fixing the bug introduced in a previous commit: use correct length of the indel event.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2713 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 21:51:54 +00:00
ebanks
fddca032bb
Initial commit of v2.0 of the cleaner. DO NOT USE. (this means you, Chris)
...
Cleaned up SW code and started moving over everything to use byte[] instead of String or char[].
Added a wrapper class for SAMFileWriter that allows for adding reads out of order.
Not even close to done, but I need to commit now to sync up with Andrey.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2712 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 21:36:42 +00:00
rpoplin
b8ae083d1b
AnalyzeAnnotations creates a plot of dbsnp rate as a function of the annotations.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2711 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 21:08:33 +00:00
rpoplin
3999a8d2c8
IntelliJ no longer complains that my methods are too complex to analyze.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2708 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 20:12:13 +00:00
rpoplin
fc4285f9fd
AnalyzeAnnotations seems to be popular so I've rewritten the guts to be easier to extend and maintain.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2707 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 19:30:31 +00:00
hanna
fa3589e5c5
Update our error messages to point to getsatisfaction.com/gsa.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2706 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 19:16:28 +00:00
depristo
3399ad9691
Incremental update 2 -- refined allele and VariantContext classes; support for AttributedObject class; extensive testing for Allele class, and partial for VariantContext. Now possible to easily convert dbSNP to VariantContext.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2705 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 17:19:37 +00:00
asivache
3edcefb7fb
add _gI and _gD to the indel probe names according to the spec (in the hope that wiki is not obsolete); added optional cmd line param -project_id to prefix all probe names with.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2704 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 17:06:49 +00:00
chartl
ed9b7edee3
Changed " to ' to stop the
...
[javadoc] /humgen/gsa-scr1/chartl/sting/java/src/org/broadinstitute/sting/oneoffprojects/variantcontext/VariantContext.java:99: warning: unmappable character for encoding ASCII
[javadoc] * if one of the alleles is deleted (?-?).
warnings on compile.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2703 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 15:23:55 +00:00
depristo
40c242d2b8
Fix for overflow issues
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2702 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 13:37:16 +00:00
aaron
8453676b71
added a method to AlignmentContext called hasExceededMaxPileup, which you can use to determine if the current site exceeded the maximum pileup size (reads were dropped). Added this as a check to unified genotyper according to Eric's instructions, and added the plumbing to the engine.
...
Also deleted the FixBamSortOrder package that isn't used anymore.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2701 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 05:17:01 +00:00
rpoplin
4bcdab580c
--output_dir has been changed to --output_prefix to give the user more control over the names of the resulting mass of files in AnalyzeAnnotations. The fontsize of the axes is increased. Cumulative filtering plots are removed since the binned filtering plots are much more useful.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2700 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 04:50:54 +00:00
chartl
df112e64b8
Minor tweaks
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2699 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 04:17:47 +00:00
ebanks
476d6f3076
RealignerTargetCreator is officially live
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2697 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 03:41:52 +00:00
asivache
1f64c5d41a
Do not slurp the whole set of snp mask sites into memory (gets pretty heavy on full dbSNP!); instantiate a privare ROD iterator instead and drag it across the sites we are designing probes for.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2694 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-26 22:39:46 +00:00
ebanks
47440bc029
- Removed max_coverage argument from UG; Aaron will set it up so that we don't call when the GATK had to drop reads.
...
- Reimplemented optimization in UG to not call when there are no non-ref bases.
- Compute reference confidence accurately in UG for ref calls.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2693 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-26 21:56:33 +00:00
chartl
2c8d7b0c44
Forgot the onTraversalDone. That was dumb.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2692 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-26 21:02:46 +00:00
chartl
04e1832968
Added - AlleleBalanceHistogramWalker -- hopefully this'll be able to tell us very clearly whether bad genotype concordance is a result of systematic contamination (consistent wonky allele balances)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2691 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-26 20:57:12 +00:00
rpoplin
a1054efe8a
Default platform and default read group are no longer set to values by default. The recalibrator throws an exception if needed values are empty in the bam file and the args weren't set by the user. This is done to make it more obvious to the user when the bam file is malformed. Similarly, the recalibrator now refuses to recalibrate any solid reads in which it can't find the color space information with an exception message explaining this. The recalibrator no longer maintains its own version number and instead uses the new global GATK version number.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2690 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-26 18:47:40 +00:00
rpoplin
0345d9f6a5
Updating the recalibrator to use non-depricated getPileup() method. Adding documentation to AnalyzeAnnotations so that the walker isn't marked as unclean at compile time.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2688 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-26 14:15:09 +00:00
depristo
c231547204
Refactoring and migration of new allele/variantcontext/genotype code into oneoffprojects. NOT FOR USE. PlinkRod commented out due to dependence on this new, rapidly changing interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2687 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-26 13:53:29 +00:00
aaron
2e57bc7879
added a better message for the SO flag error in MergingSAMIterator2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2685 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 22:57:18 +00:00
rpoplin
24d4082925
AnalyzeAnnotations can now process only variants that are found in samples that match the -sampleName argument. X-axis of plots no longer use annoying scientific notation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2684 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 20:52:11 +00:00
hanna
022601b1a5
Warnings for walkers w/o Javadoc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2683 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 20:34:50 +00:00
rpoplin
894a2b511b
Fixing no platform warning message.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2682 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 19:46:50 +00:00
rpoplin
2b51cf18f0
AnalyzeAnnotations now outputs plots with log x-axis in addition to standard x-axis so things like DP and MQ0 are easier to see. AnalyzeAnnotations now skips over all annotations that aren't floating point values. Recalibrator now warns users if PL tags are missing and so therefore it is reverting to illumina.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2681 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 19:39:18 +00:00
asivache
6cf413e630
Bug: ExpandedSAMRecord did not treat hard-clipped bases ('H') correctly. Fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2680 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 19:23:44 +00:00
ebanks
dc170caafc
Now, if a dbsnp rod is passed to either the UnifiedGenotyper or VariantAnnotator, a DB=0/1 annotation is added (in addition to filling in the ID field); this is in line with 1KG project calls. If no dbsnp rod is used, the annotation is not added (as opposed to setting every entry to DB=0).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2678 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 17:27:12 +00:00
rpoplin
5d2f8aaa54
Updating recalibrator version number after the several emergency changes last week.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2677 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 14:35:47 +00:00
jmaguire
588417e17d
Don't reference that optimiation library I'm not using anyway.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2676 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-24 20:30:50 +00:00
jmaguire
d3e3c1c2e0
don't require that optmization lib that I'm not using yet... (doh)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2675 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-24 20:28:21 +00:00
jmaguire
1d6d2b26f7
tools for optimizing calls.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2674 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-24 20:16:55 +00:00
jmaguire
877957761f
lots of new stuff, some generally useful, some one-off.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2673 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-24 19:50:48 +00:00
ebanks
78890c0bee
First version of walker that combines the functionality of IndelIntervalWalker, MismatchIntervalWalker, SNPClusterWalker, and IntervalMergerWalker - plus it allows the user to input rods containing known indels (e.g. dbSNP or 1KG calls) for automatic cleaning. Basically, all pre-processing steps for cleaning are now done in a single pass.
...
More testing needed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2672 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-24 05:32:38 +00:00
chartl
d6b9b788a8
Renamed -- PlinkRodWithGenomeLoc --> PlinkRod
...
Since binary files do not need encoded locus information in the SNP names there's no need to suggest that it is so in the name of the rod
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2671 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-23 18:19:28 +00:00
chartl
ae22d35212
PlinkRod now correctly parses binary files without indels; unit test added for this behavior.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2669 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-23 17:34:06 +00:00
chartl
94dc09c865
PlinkRod now successfully instantiates on the binary ped file trio (.bim, .bam, .fam) for non-indel files.
...
Upcoming: Test that the instantiation is correct, do it for indel-containing files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2668 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-23 16:13:24 +00:00
chartl
01db93299c
PlinkRodWithGenomeLoc now properly handels indels.
...
There is now a DELETION_REFERENCE allele type to allow for the storage of multi-base references rather than point-mutation references.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2667 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-23 07:34:52 +00:00
chartl
42fb85e7f3
PlinkRodWithGenomeLoc now properly parses text plink files. Unit test added to test this functionality. Indels and binary files to come.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2666 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-23 06:19:26 +00:00
depristo
c871a0f221
UG map() now returns a VariantCallContext object. Also has a field for confidentlyCalledBases. UG reduce() emits statistics on the confident called % of bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2664 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-22 23:06:43 +00:00
chartl
fbf82526cb
Minor renamign changes.
...
PlinkRodWithGenomeLoc now supports .bed file parsing (and doesn't require |c#_p# conventions for SNPs -- still requires _g[I/D] for indels)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2663 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-22 23:06:32 +00:00
rpoplin
fd223e955c
Reverting the previous solid change. We now refuse to recalibrate if the solid read doesn't contain proper color space information. The exception message has been updated to say this. Also, Tile has been downgraded to an ExperimentalCovariate due to performance issues.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2662 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-22 20:55:28 +00:00
rpoplin
7732f98e56
Fix for Solid reads that have '.' in their color space field. The recalibrator will just set them to be illumina reads and won't apply color space correction.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2661 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-22 20:09:16 +00:00
aaron
2ea768d902
ant clean is your friend....fixed test code dependent on an interface change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2660 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-22 20:07:46 +00:00
rpoplin
a11503819a
AnalyzeAnnotations now breaks out its TiTv plots into novel SNPs, dbSNP sites, and combined.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2659 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-22 19:00:23 +00:00
aaron
cc3b818268
cleanup of the pile-up limit exceeded warning, and a little code cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2657 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 22:17:24 +00:00
ebanks
c1e09efb23
- Fixed output for beagle header
...
- Better description for QualByDepth annotation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2655 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 21:25:56 +00:00
rpoplin
d9df72e1b5
AnalyzeAnnotations now bins variants per each annotation and outputs plots of TiTv ratio as a function of the annotation's value.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2654 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 21:15:11 +00:00
chartl
f51cffe220
Alteration of PlinkToVCF to be much more flexible about parsing .ped file headers, which can have one of a number of different standard fields, and be in different orders.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2650 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 18:02:28 +00:00
chartl
5b2a1e483e
Renamed SequenomToVCF as PlinkToVCF. Wiki will be changed accordingly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2649 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 17:35:20 +00:00
asivache
74779a9a78
First version of the tool that tries determining indel error rate (basically, counts indels that look like sequencing/alignment errors - such as a single observation at deeply covered locus, and reports the rate of their occurence)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2648 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 15:28:20 +00:00
hanna
d25a2fe120
Better handling of enums by the command-line argument system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2647 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 21:36:46 +00:00
ebanks
9c7b281b4f
Set default value for max_coverage to be 100K (since 10K is too small).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2646 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 20:15:25 +00:00
hanna
1e9fe2a334
Clean up error output when enums have missing arguments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2645 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 19:48:26 +00:00
aaron
8d1d37302c
a quick change to GLF to keep as much precision in our likelihoods as long as possible, before we put it into byte space. Sanger was doing a diff at low coverage and noticed our calls didn't contain as much precision as theirs. Updated the MD5 for unified genotyper output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2644 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 19:36:49 +00:00
hanna
908d399670
Bug fix for help text / version number - help text retriever was crashing in the debugger if help text hadn't been built.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2643 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 19:18:19 +00:00
chartl
ab289872e4
Changes:
...
- Annotations return null when given pileups with no second-base information
- SequenomRodWithGenomeLoc -- beter handling of indels
Eric; I made two small changes to the new Genotype interface that we should talk about (they basically have to do with allele/genotype representation):
Allele - added a new UNKNOWN_POINT_MUTATION to AlleleType. If I see a sequenom genotype AG; one's got to be ref, one's got to be SNP, but until I have
an actual reference base in hand, I don't know which is which. That's what this entry is for.
Genotype - added an enum class StandardAttributes for dealing with things like deletion/inversion length. This is probably not the way we want to
represent indels, so we should talk about this. Plus now that there's a direct link between my ROD and the genotype; when we do decide
how to deal with indels, we'll be forced to alter the SequenomRodWithGenomeLoc accordingly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2642 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 16:45:17 +00:00
aaron
a1b4cc4baf
changes to intelligently log overflowing locus pile-ups.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2640 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 08:09:48 +00:00
ebanks
4ac9eb7cb2
- Smarter strand bias calculation
...
- Better debug/verbose printing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2639 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 03:01:26 +00:00
depristo
ff66023d83
Trivial change to support filter field in VCF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2636 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 22:56:22 +00:00
asivache
4625261d79
Bug fix: alignments ending with 'I' were not counted into the overall coverage which resulted in inaccurate stats, and in rare occasions outright messed up ones.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2635 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 22:12:16 +00:00
hanna
8dafd26100
Print out the current version number in the application header.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2633 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 21:58:36 +00:00
depristo
9e0ae993c7
-B 1kg_ceu,VFC,CEU.vcf -B 1kg_yri,VCF,YRI.vcf system supported to allow 1KG % (like dbSNP%)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2632 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 21:33:13 +00:00
rpoplin
c98df0a862
Updated solid_recal_modes to work with bfast aligned data. Added an integration test that uses the BFAST file provided by TGen.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2630 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 21:18:02 +00:00
chartl
53352e1bb4
First pass at a sequenom ROD. Nothing uses it; currently undergoing testing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2629 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 17:09:36 +00:00
hanna
1488578617
Working with Aaron to get svnversion running within the build system. This change will break the build.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2628 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 16:55:42 +00:00
rpoplin
bca436578f
Added the -maxQ argument to the list of arguments in the PG tag
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2627 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 15:55:23 +00:00
rpoplin
d61cafd19f
Make the formatting of the list of args in the PG tag consistent.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2626 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 15:31:37 +00:00
rpoplin
a12465b6d5
The recalFile argument is no longer added into the PG tag of a bam produced by TableRecalibration. Based on a request from the Sanger.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2625 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 15:25:57 +00:00
rpoplin
ba19afd529
Draft version of AnalyzeAnnotations which creates plots of cumulative TiTv ratio versus filter value per each annotation in the input VCF rod. Minor cleanup of recalibration walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2623 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-18 20:47:10 +00:00
kiran
ff6877a15e
Added a forgotten column label
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2622 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-18 01:00:52 +00:00
kiran
dd6d5aadf9
Computes empirical confusion matrices, optionally with up to five bases of preceding context
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2621 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-18 00:55:12 +00:00
ebanks
12453fa163
Misc cleanup of UG args
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2620 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-17 04:38:52 +00:00
ebanks
b8cdf64c20
Better descriptions for max reads/downsampling args
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2618 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-17 02:30:27 +00:00
depristo
64225b28fd
Convenience methods for getting the VCFReader and VCFRecord
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2614 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 20:22:31 +00:00
depristo
d0af7f6c7b
Now analyzes filtered SNP like all, novel subsets; support for selecting a single sample to analyze from a multi-sample VCF, support for trivial selection of records with INFO field key/value pair.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2613 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 20:22:04 +00:00
depristo
8ae8e120f8
New annotateUnion operation -- provides clearer annotations on where a call came from when unioning two VCF call sets
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2612 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 20:20:37 +00:00
depristo
41392f8ff5
functions for setting gentoype records and alternate bases; function for getting all rods implementing VCF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2611 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 20:19:43 +00:00
hanna
ac4756db20
Add the svn version on the fly to the version number properties.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2607 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 00:28:01 +00:00
hanna
420cef4094
Added version numbers to the help doclet extractor. Since the help system is behaving
...
more like a resource bundle at this point, changed it over to use the Java ResourceBundle
support classes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2606 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 23:31:29 +00:00
rpoplin
4de7d6a59b
Initial checkin of skeleton code for AnalyzeAnnotations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2605 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:52:34 +00:00
hanna
930082314a
Put a major.minor version into the GATK Javadoc for reading. Also,
...
update some straggler packages to the new package-info.java format introduced in 1.5.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2604 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:48:30 +00:00
mmelgar
3063224446
SecondaryBaseTransitionTableWalker now breaks by genotype and read group, is javadoc annotated, and is compatible with ReadBackedPileup's methods.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2603 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:43:39 +00:00
asivache
7a991421f7
-erw argument, begone! Rod traversals are now enabled. current tests pass, more tests for RODWalkers are welcome ;)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2601 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:11:14 +00:00
asivache
c8c5c176cd
-erw argument, begone! Rod traversals are now enabled. current tests pass, more tests for RODWalkers are welcome ;)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2600 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:07:49 +00:00
asivache
a12933a26d
Bug fixed: now the length of an insertion is determined correctly. Thought I committed this...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2599 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 20:58:48 +00:00
asivache
404b95183f
This is a LocusWalker, not a RodWalker (thanks Mark!!). RodWalkers currently are not capable of attaching alignment contexts (reads) to the ROD-annotated loci they traverse over...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2596 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 20:33:41 +00:00
rpoplin
7078219b89
Updating outdated comments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2595 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 19:17:52 +00:00
rpoplin
ba2acda406
Clarifying the comment regarding differentiating between first and second of pair in CycleCovariate.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2594 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 18:36:14 +00:00
ebanks
b911b7df82
Fixing the AC annotation to be in line with the VCF spec
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2593 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 18:28:52 +00:00
rpoplin
f2e539c52f
As per discussions with Tim we are reverting the previous change regarding PairedReadOrderCovariate. The CycleCovariate now differentiates between first and second of pair by multiplying the cycle by -1. PairedReadOrderCovariate has been removed completely.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2592 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 18:18:59 +00:00
asivache
eae1b73945
Fixed a bug in left-adjusting the indels introduced in previous commit :-/
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2591 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 17:41:23 +00:00
rpoplin
df998041a8
Minor change to solid warning message. Added note for a future solid recalibration integration test when we get the required data file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2590 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 16:31:25 +00:00
rpoplin
70df30fc1b
Added method to AlignmentUtils which takes a read's cigar and the refBases char array given to a ReadWalker and returns the aligned reference char array. Bug fix in solid_recal_modes to use this aligned reference array. Recalibrator version number is no longer separate for each of the two walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2589 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 15:36:59 +00:00
ebanks
2a116bb5d6
Made the VCF validator a simple rod walker instead of having it be in a separate package.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2588 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 06:39:06 +00:00
hanna
b19bb19f3d
First successful test of new sharding system prototype. Can traverse over reads from a single
...
BAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2587 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 03:35:55 +00:00
aaron
db9570ae29
Looks bigger than it is:
...
* Moved GATKArgumentCollection into gatk.arguments folder to clean up the main folder, also added some associated argument classes (most of the changes).
* Added code the argument parsing system for default enums, we needed this so we could preserve the current unsafe flag, and at the same time allow finer grained control of unsafe operations. You can now specify:
"-U" (for all unsafe operations), "-U ALLOW_UNINDEXED_BAM" (only allow unindexed BAMs), "-U NO_READ_ORDER_VERIFICATION", etc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2586 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 00:14:35 +00:00
kiran
04fdbbfa65
This is the beginning of a new version of VariantEval that can cut VCF files up in a variety of ways with JEXL expressions, select one sample out of a multi-sample VCF, and can load analysis modules dynamically.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2584 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:45:58 +00:00
asivache
df63f51253
No changes, just sync-ing; only some commented out debugging prints are added...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2583 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:45:15 +00:00
asivache
d85461c463
MergingIterator completely re-done. Now it is not a generic class (sorry guys), but rather it is tailored for merging ROD tracks. This implementation peeks the locations of next ROD annotations in each track, but does not actually read these RODs from underlying streams until the location is reached and it is time to actually return the object. Now underlying ROD track iterators (registered in the resource pool!) are not advanced prematurely past the current position and all the way to the next ROD record wherever it is, so that the sharding system can reuse them.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2582 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:43:36 +00:00
asivache
c0891d512f
added: peekNextLocation(); it's quite hard (and probably unnecessary, ever) to make seekable iterator a peekable one, but it is quite easy and useful to be able to peek just the next location the iterator will jump to after next call to next()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2581 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:38:19 +00:00
ebanks
a082b948a3
Support throughout for S and N cigar elements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2579 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 03:45:42 +00:00
chartl
424d1b57f7
Sequenom to VCF now allows user to specify filters for QC, and they will appear in the filter field of the output VCF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2577 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 23:22:37 +00:00
rpoplin
49c44e7b36
PairedReadOrderCovariate is now a standard covariate and because of this CycleCovariate no longer multiplies by negative one for second of pair reads. Added PairedReadOrderCovariate to some of the integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2574 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 20:09:10 +00:00
hanna
05575e2e56
Better bounding for the locus window. Don't make the locus window calculation blow up if the GenomeLoc ends
...
up being outside the reference. Force the blowup elsewhere.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2573 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 17:03:54 +00:00
ebanks
8ca5bba738
We emit genotype data in the VCF record if the format string instructs us to (regardless of whether or not genotypes are provided - this was the wrong test).
...
SequenomToVCF now correctly has no-calls when probes fail.
Re-enabled SequenomToVCF integration test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2572 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 15:40:27 +00:00
chartl
6d1107a4ed
Update to SequenomToVCF
...
Output changing slightly so integration test disabled temporarily
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2571 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 15:32:05 +00:00
ebanks
f99586f91b
Added integration test for beagle and verbose output in UG.
...
Minor cleanup of VCFRecord code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2570 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 03:55:24 +00:00
hanna
02e23e2d9c
Threading support for beagle output files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2569 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 02:42:16 +00:00
aaron
0513690416
two fixes in the new cached DbSNP code:
...
-isBiallelic would incorrectly say triallelic sites are biallelic.
-getAlternateAlleleList was broken, since the new cached list is immutable, we couldn’t remove list items.
Also added a dbSNP validating walker to the one-offs, for testing the new b37 130 dbSNP rod.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2568 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 00:27:34 +00:00
asivache
a138bad95a
A rare but not-so-subtle bug fixed: a funky alignment (a kind that should not have been generated in the first place) could make the indel left-adjusting method to overshoot read start and build a cigar like -3M6I...
...
also, few minor fix-ups.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2567 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 21:29:50 +00:00
rpoplin
b51f4aae11
Updating the recalibrator to make use of StingSAMFileWriter.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2566 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 20:58:27 +00:00
rpoplin
c8ad025ad0
cleaning up unused import statements
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2565 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:52:37 +00:00
rpoplin
189829841b
The recalibrator now uses all input RODs when looking for known polymorphic sites not just the one named dbsnp. Added an integration test which uses both dbsnp and an input vcf file and skips over the union of the two.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2564 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:50:39 +00:00
aaron
16777e3875
more fixes for the empty interval list problem; you can now run LocusWindow traversals with an empty interval list, but the GATK will give you a warning (unless you're running in unsafe mode).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2563 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:47:43 +00:00
hanna
35a4fcc481
Additional sanity checking: make sure the user can't alter the header / compression level / presorted state of a file to which SAMRecords have already been written.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2562 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:39:41 +00:00
ebanks
03b7d5f5c7
1. Fixed small but embarrassing bug in weighted Allele Balance annotation calculation.
...
2. Made RankSumTest abstract; added 2 subclasses: BaseQualityRST and MappingQualityRST (the latter based on a suggestion from Mark Daly). Untested so they're still experimental.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2561 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:33:53 +00:00
hanna
58999a8e9d
Enhance the I/O management system to support custom headers and set the presorted flag
...
from the initialize() method (or at any time before the first SAM record is written).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2560 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:21:42 +00:00
aaron
3c5f5177b1
check to see if the parsed interval list is empty, since we now allow interval files that are empty. If so, make sure we default to a non-interval based traversal.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2559 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 17:52:27 +00:00
ebanks
040fdfee61
Cleaned up the interface to VCFRecord. It's now possible (and easy) to create records and then write them with a VCFWriter.
...
I've updated HapMap2VCF to use the new interface; Chris agreed to take care of Sequenom2VCF.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2558 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 21:42:12 +00:00
ebanks
42aff1d2c3
Annotator in general should be able to annotate monomorphic or tri-allelic sites.
...
It's up to the individual annotations to decide whether they want to annotate or not.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2556 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 19:52:18 +00:00
rpoplin
11f91b3c95
Reverting Eric's previous change because it killed the PG tag in the output bam file header. Added a new -compress command line argument to set the compression level of the output bam file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2555 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 19:02:56 +00:00
chartl
dfa3c3b875
Added:
...
SequenomToVCF - Takes a sequenom ped file and converts it to a VCF file with the proper metrics for QC. It's currently a rough draft,
but is working as expected on a test ped file, which is included as an integration test.
Modified:
VCFGenotypeCall -- added a cloneCall() method that returns a clone of the call
Hapmap2VCF -- removed a VCFGenotypeCall object that gets instantiated and modified but never used
(caused me all kinds of confusion when I was basing SequenomToVCF off of it)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2554 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 17:17:21 +00:00
rpoplin
62dd2fa5be
Fixing another bug in solid recal regarding negative strand reads. The isInconsistentColorSpace method incorrectly used the inconsistent tag added by parseColorSpace, the inconsistent tag is in the direction of the read like the color space tag, and not in the direction of the reference like everything else. This affects the recalibrated quality scores but the improvment in SNP calling performance is minor when using the default UG settings (min base quality 10).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2553 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 14:28:52 +00:00
ebanks
971834ca90
Added a walker to the vcf tools compilation: one that combines vcf records. Both merges and unions are supported (see documentation... when it gets written this week).
...
Also, moved some code that pulls samples out of rods from VCFUtils into SampleUtils.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2552 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-10 06:45:11 +00:00
ebanks
80af0f2f54
Changed the OUTPUT_BAM_FILE argument from String to SAMFileWriter and removed the call to close().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2551 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-10 03:45:54 +00:00
hanna
7893aaefe9
Updates to chunk iteration. Includes the return of the dreaded *2.java files;
...
hopefully I can find a way to kill these off before the Picard patch is ready.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2550 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 20:20:56 +00:00
ebanks
fcce77c245
Added -beagle option to emit likelihoods file for use with the BEAGLE imputation engine; still experimental.
...
(Also converted getPileup -> getBasePileup)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2549 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 18:41:04 +00:00
rpoplin
9cbae53ee1
Bug fixes for both SET_Q_ZERO and REMOVE_REF_BIAS solid recal modes regarding proper handling of negative strand reads. These changes yield a minor improvment in HapMap sensitivity.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2548 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 15:19:22 +00:00
ebanks
d5ab002449
Curiously, it seems I never set the default base quality used by the Genotyper to 10. It's done now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2546 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 06:02:01 +00:00
ebanks
b468369dfa
-UG's call into VariantAnnotator now uses the full alignment context (as opposed to the filtered one)
...
-MQ0 annotation is now standard again
-Added AC and AN annotations to VCF output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2545 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 05:40:42 +00:00
rpoplin
f587ff46af
Tile is now a standard covariate. By default the TileCovariate returns -1 if tile can't be derived from the read's name. Added a new command line option -throwTileException which will force TileCovariate to throw an exception if tile can't be derived for a read. Singleton covariates, such as any read group without tile info, must be skipped over in TableRecalibration so that the sequential formulation doesn't apply the same correction more than once. TileCovariate class has been added to the Early Access package.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2544 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 22:51:41 +00:00
asivache
d01bde36a4
Make sure that reference view holds enough bases to pass full-length deleted sequence to the walker's map() function in extended event mode (this addresses the problem of a deletion crossing the shard's boundary, so that an attempt to extract deleted bases results in a crash)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2543 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 22:37:22 +00:00
asivache
e9bc85c188
Now has methods that allow to 1) check if a location is within the bounds of the reference view; 2) expand reference view (i.e. expand the bounds and reload the reference sequence) in order to accomodate specified location. The second method can be called directly since it performs a check and if the location is already within the bounds, then returns immediately. The costly ref sequence reloading occurs only when the location is not fully contained within the current bounds.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2542 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 22:35:17 +00:00
asivache
7f91b4d824
Bug fix. It would be nice if we could extract ROD annotations for the whole length of an extended event (indel), and we tried... But alas, it does not work with the current ROD system (after extracting length on ref > 1 ROD data for a deletion, rod iterator crashes on the attempt to re-load annotations for next reference base)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2541 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 21:30:55 +00:00
rpoplin
5f58492401
A rogue QualityUtils.MAX_REASONABLE_Q_SCORE managed to get through my previous bug fix. It should instead check the command line -maxQ argument.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2540 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 21:17:39 +00:00
ebanks
c7a8dffa89
Check for division by 0 in annotations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2539 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 19:27:15 +00:00
ebanks
9a658e6b18
-Fixed VCF header line bug
...
-Added useful trim() method for Strings for characters other than whitespace
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2538 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 17:51:41 +00:00
ebanks
b643a513bb
Minor interface change for VCFGenotypeRecord.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2537 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 16:48:09 +00:00
andrewk
431e9c2c8b
Add dbSNP ID to VCF output records
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2536 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 15:30:04 +00:00
depristo
076481f786
Fixes to mergeVCF -- now correctly supports merging of filter fields. Also removed incorrect hasFilteringCodes() function. Updated intergration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2535 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 14:50:13 +00:00
rpoplin
cea544871d
Fixed an issue with recalibrating original quality scores above Q40. There is a new option -maxQ which sets the maximum quality score possible for when a RecalDatum tries to compute its quality score from the mismatch rate. The same option was added to AnalyzeCovariates to help with plotting q scores above Q40. Added an integration test which makes use of this new -maxQ option.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2534 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 13:50:30 +00:00
ebanks
6c739e30e0
1. Removing an old version of the Genotype interface which is no longer being used. Needed to do this now so that the naming conflicts would cease.
...
2. Adding a preliminary version of the new Genotype/Allele interface (putting it into refdata/ as the VariantContext really only applies to rods) with updates to VariantContext. This is by no means complete - further updates coming tomorrow.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2533 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 05:51:10 +00:00
depristo
a9245a58e2
Fix for incorrect exception throwing in VCFRecord. It is reasonable to ask for the non-ref allele freq at all ref sites. Was only passing in tests because isReference was broken
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2532 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 01:18:30 +00:00
depristo
7215526810
Fix to isReference() in VCFRecord. Change to VariantCounter to correctly counter only non-genotype variants, as well as update to VariantEvalWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2531 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 00:03:29 +00:00
andrewk
6c4ac9e663
Updated HapMap2VCF to use the VCFGenotypeWriterAdapter interface; fixed bug in VCFParameters that affects VariantsToVCF and HapMap2VCF when reference is lower-cased; added integration test for HapMap2VCF that checks for the lower-case issue by testing against Hg18 region that has lower-cased bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2530 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 21:27:11 +00:00
aaron
576594eda2
clean-up of the GATK paper genotyper, and better output formatting for the simple call format we emit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2529 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 20:54:56 +00:00
chartl
7e3e714d3c
Moving experimental annotations from core to oneoffs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2528 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 19:34:10 +00:00
chartl
a32245f7d2
Modifications:
...
QualityUtils - Stole the BaseUtils code for flipping reads around and applied it to quality scores
SecondBaseSkew - Nothing's really different, just a commented line
Additions (experimental annotations for future development of second-base annotation)
** I DO NOT INTEND FOR ANYONE TO USE THESE **
- ProportionOfNonrefBasesSupportingSNP
- ProportionOfSNPSecondBasesSupportingRef
- ProportionOfRefSecondBasesSupportingSNP
+ I hope these are self-explanatory
- QualityAdjustedSecondBaseLod
+ Adjust lod-score by 10*log10[P[second bases are as observed]]
Added walker:
QualityScoreByStrand - oneoff project that's being saved if i ever need it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2527 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 19:18:07 +00:00
asivache
eb899741e1
reverting last changes. no cacheing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2526 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 18:59:37 +00:00
asivache
a17d725c35
Cache pileup bases and mapping quals after first call to getBases() and getMappingQuals(), respectively. Subsequent calls to these method will return cached arrays.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2525 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 18:05:00 +00:00
ebanks
d6fb19bb67
Don't hard-code base qual max
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2524 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 17:21:44 +00:00
rpoplin
75809100c6
Use inheritance so that shared code isn't duplicated between the RecalDatums
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2523 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 16:45:16 +00:00
ebanks
fdd14e1a01
Proposed interface for VariantContext. It's currently an interface so it doesn't break the build...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2521 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 16:31:39 +00:00
rpoplin
e011a1b6f8
Cut the memory footprint of the RecalDatum in half to improve performance of CountCovariates when run with many covariates.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2520 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 16:12:27 +00:00
rpoplin
370a365147
Small runtime improvement in TableRecalibration.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2519 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:51:12 +00:00
ebanks
b745c2f8d7
Fix for Jared: don't blow up if there are no samples in the input (since that's allowed) - but warn the user just in case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2518 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:37:06 +00:00
depristo
1e462419da
trivial code restructuing, and commented out failed attempt to support sample selection with VCF. VariantEval2 go go go
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2516 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:04:27 +00:00
depristo
f857159343
useful convenience function to get a genotype associated with a particular sample
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2515 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:03:07 +00:00
depristo
34519b3e3b
Better printing support for false positives and false negatives in concordance tables
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2514 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:02:40 +00:00
depristo
592749a7c1
isNBase method
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2513 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:01:51 +00:00
depristo
5ce11c3dad
toString method
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2512 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:01:20 +00:00
rpoplin
1c90e6a954
More informative error message in AnalyzeCovariates and cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2511 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 14:56:29 +00:00
depristo
bca3d1b943
useful convenience function to get a genotype associated with a particular sample
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2510 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 14:53:56 +00:00
depristo
ec774f62be
Some checking to protect the BasicGenotype
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2509 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 14:53:24 +00:00
rpoplin
71ecbe75d7
AnalyzeCovariates would crash with 'too many open files' exception when spawning Rscript jobs for every read group at once. It now waits for some to finish before spawning the rest.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2508 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 14:19:02 +00:00
depristo
21a50eedb5
Simple extension to VariantEval: --includeFilteredRecords will now keep filtered VCF records so you can see what the entire call set looks like. Looking forward to VariantEval v2 from Kiran.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2506 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 12:59:09 +00:00
depristo
8d13597a27
Temporary command-line support to enable rod walkers, if you know what you are doing this is safe.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2505 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 12:15:36 +00:00
ebanks
d8351cb9fc
Give Annotations access to rod data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2504 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-05 18:53:01 +00:00
ebanks
8b087305f3
Added back the MQ0 annotation - however, it's not yet standard (since mq0 reads are filtered out by default in the genotyper). But it'll work when using the Annotator as a standalone.
...
While I'm at it, change getPileup to getBasePileup to remove all of the deprecation warnings.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2502 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-05 17:07:19 +00:00
hanna
a4b69d0adf
Misc bug fixes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2501 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-05 14:48:19 +00:00
depristo
c209ba55aa
More informative error message
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2499 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-05 13:55:20 +00:00
rpoplin
0a6bd5a270
CycleCovariate is now one-based so that 0 and -0 don't collide with each other. Solid recal modes now only change the inconsistent base and the previous base (along the direction of the read) instead of both the bases before and after. Removed estimatedNumberOfBins from the Covariate interface because it wasn't being used.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2498 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-04 20:52:15 +00:00
ebanks
ed2fff13aa
-Misc improvements to VCF code
...
-Small fix to callset concordance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2497 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-04 02:28:47 +00:00
hanna
29c129aced
Added very primitive read fishing walker with lots of hard coding. Fixed
...
bugs encountered when testing read fishing in Ecoli.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2496 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-04 00:54:57 +00:00
ebanks
7b702b086f
You don't need to be bi-allelic to have a non-ref alt allele frequnecy, but you do have to be a variant.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2495 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-03 22:02:39 +00:00
ebanks
b668d32cf1
Updated the min mapping quality and min base quality defaults to be 10 in both cases (and updated all integration tests) as suggested by Mark.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2494 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-03 21:31:04 +00:00
hanna
b6ecc9e151
Support for ad-hoc reference sequences. Also reenabled BWA/Java integration test, which was commented out
...
and the data backing it up deleted without my knowledge. Unfortunately, since the data was deleted, I had
to regenerate the data and a new md5. Hopefully the aligner output is still correct.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2493 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-02 20:19:14 +00:00
asivache
46362ce532
In extended event lines, now prints deletions in verbose format as well (e.g. "-AAT")
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2490 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:57:20 +00:00
asivache
a18e31f5b8
If alignment context at the locus holds extended event, get rod metadata and (importantly) reference bases for the whole span of the event (if it is a deletion that is, insertions still have length 0 on the ref!)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2489 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:56:25 +00:00
asivache
a41cb0701b
Now can generate verbose String representation of deletions (e.g. "-AAT") if reference bases are provided as an argument to getEventStringWithCounts().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2488 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:54:50 +00:00
asivache
89791d730e
Compute and cache the length of the longest deletion observed at the site; ReadBackedExtendedEventPileup now has a getter to access that value.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2487 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:19:39 +00:00
asivache
8932e67325
Removed sanity check that required GenomeLoc argument to be strictly 1-base long. We need to relax this in order to be able to pass around a reference context containing full-length chunk of deleted reference bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2485 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 20:14:08 +00:00
hanna
497ae700c4
A rethink of the existing BAM block extraction code: rather than working in
...
chunk space directly, stream data in block space, converting to chunk space
on demand.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2484 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 18:19:51 +00:00
rpoplin
80658fd99e
AnalyzeCovariates gets the same performance improvements as the recalibrator. NHashMap class is removed completely.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2483 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 18:10:10 +00:00
rpoplin
9b2733a54a
Misc clean up in the recalibrator related to the nested hash map implementation. CountCovariates no longer creates the full flattened set of keys and iterates over them. The output csv file is in sorted order by default now but there is a new option -unsorted which can be used to save a little bit of run time.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2482 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 16:58:04 +00:00
asivache
c928347c0c
Extended event pileups are more verbose now: following a sequence of 'D','I', and '.' symbols, actual distinct events are listed along with their counts (example: +AAA:3,+AAC:1 for the total of 4 indel observations with 3 reads showing +AAA and one read showing +AAC)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2480 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 22:44:18 +00:00
asivache
8330058216
method added: getEventStringsWithCounts()
...
Returns list of Pairs <String,Integer>, where each pair consists of a unique indel event observed at the site and the total number of observations of that event. String representation for insertions is verbose (e.g. +ACT), while deletions are represented as "5D" (since read backed pileup has no reference information, so we can not get actual sequence of deleted bases)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2479 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 22:41:58 +00:00
asivache
cf3e59eb4a
back to archive
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2478 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 22:00:38 +00:00
asivache
295d16572e
synch; will go back to archive in a sec
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2477 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 22:00:03 +00:00
asivache
e286313b67
Fix for reads that have insertion as their last (mapped) cigar elements (i.e. not followed by M)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2476 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 21:13:16 +00:00
hanna
05deb8796b
Simplify handling of reference sequence for unmapped reads. Improvement made based on a suggestion from Alec.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2475 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 21:06:20 +00:00
rpoplin
96c4929b3c
Recalibrator now uses NestedHashMap instead of NHashMap. The keys are now nested hash maps instead of Lists of Comparables. These results in a big speed up (thanks Tim!). There is still a little bit of clean up to do, but everything works now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2474 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 21:01:32 +00:00
asivache
bfd6bf9ec5
PileupWalker just got a new option: --showIndelPileups. When this option is used, two lines are printed for every genomic location that has indels associated with it: first line is a conventional base pileup, the second line is an "extended event" (indel) pileup. The refence base in that second line is always set to "E" (for Extended), and the pileup string contains I,D,. symbols for insertion, deletion, noevent, respectively. Only this simple short format for indel pileups is implemented so far.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2472 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 20:16:34 +00:00
asivache
9652692019
Modified to enable locus traversals firing additional calls to walker's map() with alignment context filled with extended events (indels). Walker should override generateExtendedEvents() to return true, and it should make sure that it catches those additional indel pileups and processes them differently, as needed. If there are indels associated with a specific reference base, TWO map() calls will be issued in locus traversal at that location: first one will have a context filled with a regular base pileup, the second call will provide the context filled with indel pileup (pileup elements will have insertion, deletion, or noevent type associated with them and will also carry information about the full length of the event and inserted bases).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2471 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 20:13:25 +00:00
asivache
06eb576924
Can now be constructed with either base pileup or extended event (indel) pileup; has query methods checking what kind of pileup is served by the context, and getter methods return the appropriate pileup. TODO: while it is impossible right now to create a context that contains both types of pileups simultaneously, this restriction is only weakly enforced through the lack of appropriate constructor. Either we keep it this way, or some getters may become ambiguous and have to be fixed!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2470 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 20:07:29 +00:00
asivache
f445745c56
Pileup element and corresponding container class tweaked for representing pileups of extended events (indels) at a given locus. There's some redundancy with PileupElement and ReadBackedPileup (should we rename them to BasePileupElement and ReadBackedBasePileup?), so that abstracting a basic interface/abstract base from these classes can be considered in the future
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2469 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 20:03:39 +00:00
depristo
87e863b48d
Removed used routines in duputils; duplicatequals to archive; docs for new duplicate traversal code; general code cleanup; bug fixes for combineduplicates; integration tests for combine duplicates walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2468 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 19:46:29 +00:00
depristo
29f94119d1
Fixes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2466 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 18:08:41 +00:00
ebanks
5fdf17fccb
Removed the VCF "NS" annotation (which wasn't working for pooled calls anyways) since it's ambiguous and not useful.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2465 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 17:30:47 +00:00
hanna
e32174fbc4
UnifiedGenotyper now works without -varout or -vf set.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2464 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 16:46:24 +00:00
hanna
b125571a98
Intermediate check in: transfer responsibility of wrapping the GenotypeWriter around the output stream to the output
...
management code. Currently, will not work when neither -varout nor -vf are specified, but should work in all other
cases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2463 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 16:11:11 +00:00
ebanks
aeb34758e6
Adding a validation stringency to the VCF writers (which defaults to STRICT). If set to SILENT, it will not throw an exception for (reasonable) off-spec requests but will instead ignore such requests and silently move on.
...
This change allows the pooled calculation model to work correctly with multiple threads. Boys, the Genotyper is now officially parallelized.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2462 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 15:33:53 +00:00
rpoplin
29a3d9b47a
AnalyzeCovariates also has to skip over NO_DINUC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2461 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 14:36:05 +00:00
depristo
fcc80e8632
Completely rewritten duplicate traversal, more free of bugs, with integration tests for count duplicates walker validated on a TCGA hybrid capture lane.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2458 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 23:56:49 +00:00
hanna
d4ee999ef9
Creates files supplemental to the reference sequence, consumed by BWA.
...
ANN - Alternate form of the sequence dictionary. Should be created from a sequence dictionary with full contig names.
AMB - A map of 'holes' in the genome, aka runs of non-ACGTacgt bases. This skeletal implementation always reports no
holes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2455 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 21:40:44 +00:00
rpoplin
fcc52fbcd1
Fixed the build. Added missing import line.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2454 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 21:26:00 +00:00
ebanks
893c9c85fa
Added previous optimization to diploid (non-pool) model and shaved off 20% of runtime from it. Moved out some common functionality to joint estimate parent class.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2453 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 21:20:48 +00:00
rpoplin
92e3682991
Moved NHashMap to sting/utils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2452 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 20:57:32 +00:00
rpoplin
562db45fa5
Sites that were marked NO_DINUC no longer get dinuc-corrected but are still recalibrated using the other available covariates. Solid cycle is now the same as Illumina cycle pending an analysis that looks at the effect of PrimerRoundCovariate. Solid color space methods cleaned up to reduce number of calls to read.getAttribute(). Polished NHashMap sort method in preparation for move to core/utils. Added additional plots in AnalyzeCovariates to look at reported quality as a function of the covariate.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2451 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 20:19:37 +00:00
asivache
2a704e83df
Reads now have new traversal flag: generateExtendedEvents(). Support added to GenomeAnalysisEngine and Walker. This is a silent and transparent framework change that no existing code is going to see. The actual code that makes use of the new flag (which is false by default) will be committed separately...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2450 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 19:52:44 +00:00
ebanks
c8d0e6e004
Optimization to pooled calculation model: stop calculating P(D|AF) if we are beyond the max likelihood such that subsequent likelihoods won't factor into the confidence score. Also, use new Pileup interface.
...
Pooled calling now takes less than half the time it used to.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2449 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 18:39:55 +00:00
ebanks
b1ac4b81d5
Optimization: look up diploid genotypes from a static matrix instead of creating them on the fly (with String.format); bases no longer need to be ordered appropriately
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2448 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 17:28:51 +00:00
andrewk
57516582c2
Converter from HapMap chip genotype data to VCF added; HapMapGenotypeROD adjusted to not convert from Hg18 to b36 formatting of contigs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2447 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 01:36:08 +00:00
ebanks
d2770f380c
Writing calls to standard out now works again (it got broken when we introduced parallelization)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2446 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-27 04:36:45 +00:00
ebanks
12990c5e7a
Added qual-by-depth annotation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2445 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-25 02:30:30 +00:00
ebanks
0571d9dcb9
Point MAX_QUAL_SCORE to SAMUtils.MAX_PHRED_SCORE.
...
Also, array size for caches should be max score + 1.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2444 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 20:47:32 +00:00
ebanks
438d21842a
The new recalibrator had been mimicking the behavior of the old one in that if there was no dinuc available (following a no-call base or at either end of a read), it didn't try to recalibrate. Now that Ryan has modularized the system, we no longer need to skip the base completely (we just need to skip the dinuc value)... which is good because the Picard people complained after realizing that cycle #1 never got recalibrated.
...
The major effects of this commit are as follows:
1. We no longer skip any good bases (of course, this change alone breaks every single integration test).
2. The dinuc covariate returns a "no dinuc" value for the first base of a read (but not for the last base anymore, since there is a valid dinuc) or if the previous base is a bad base (e.g. 'N').
I've done a bunch of testing on real data and everything looks right; however, let's wait until the recalibrator guru gets back from vacation next week and can double-check everything before shipping this out in another early access release.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2443 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 20:41:29 +00:00
ebanks
aaf674d9db
Cleaned up this annotation.
...
Still experimental. As of now, it's not useful. More analysis is needed to determine how to handle cases where UG is unsure whether a sample is het or hom.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2442 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 03:06:46 +00:00
ebanks
6df40876a3
Un-reverted Matt's previous changes and fixed integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2441 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 02:47:00 +00:00
hanna
2bd0b1bbf7
After further review, it's unclear that my patch in RecalDataManager was the right choice. Reverting.
...
Also updating other IntervalCleanerIntegrationTest failures that were masked by my first patch.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2440 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 00:32:33 +00:00
hanna
98c268483e
Fixed issues with the integration tests:
...
1) sam-jdk apparently no longer supports custom tags with type int[] values.
2) BAM output for indel cleaner integration test changed in a way that's so subtle it can't be seen after converting the output to .sam.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2439 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 23:12:22 +00:00
aaron
b134e0052f
added changes to the code to allow different types of interval merging,
...
1: all overlapping and abutting intervals merged (ALL),
2: just overlapping, not abutting intervals (OVERLAPPING_ONLY),
3: no merging (NONE). This option is not currently allowed, it will throw an exception. Once we're more certain that unmerged lists are going to work in all cases in the GATK, we'll enable that.
The command line option is --interval_merging or -im
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2437 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 21:59:14 +00:00
alecw
159778416c
In TableRecalibrationWalker, update UQ tag if it was present in the original SAMRecord. This required a new sam.jar, which caused some other files to need to be changed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2435 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 21:42:36 +00:00
hanna
87ff2b15d4
First step in introducing a patch to Picard: create our ideal interface into the BAM file for sharding.
...
This commit can iterate over the BAM file, pulling out information about the blocks in the file without actually loading
or decompressing the reads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2434 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 21:35:08 +00:00
ebanks
dc96879861
2 separate changes which both affect lots of UG integration md5s, so I'm committing them together:
...
1. allele balance annotation is now weighted by genotype quality (so we don't get misled by borderline het calls)
2. Updates to the Unified Genotyper for parallelization:
a. verbose writing now works again; arg was moved from UAC to UG
b. UG checks for command that don't work with parallelization
c. some cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2432 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 19:03:56 +00:00
ebanks
872a9d1c7b
I'm making this change now (as opposed to waiting until Monday) to honor Tim's request.
...
The cycle covariate is now first/second of pair aware. I'm taking it on faith from both Chris Hartl (waiting on slides from him) and Tim that this is the right thing to do. We'll have Ryan confirm it all next week.
The only change is that if a read is the second of a pair, we multiple the cycle by -1 (a simple way of separating its index from that of its mate).
Of course, this broke all integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2431 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 16:26:43 +00:00
hanna
e29e8e52b9
Multithreading support for the unified genotyper. Tests on a 10Mbase region on pilot 1 show a 6.8x improvement
...
when running 8 ways parallel.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2430 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 00:48:06 +00:00
kiran
164a94a3d0
Modified the walker documentation so that the stray punctuation wouldn't cause the GATK to stop parsing the help documenation early (aka I changed one word).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2429 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-22 20:50:01 +00:00
kiran
4ee6a478e3
Creates a table of reference allele percentage and alternate allele percentage at Hapmap-chip sites in a BAM file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2428 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-22 20:43:44 +00:00
ebanks
03bf75e335
Now implements TreeReducible
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2427 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-22 17:52:51 +00:00
hanna
0d890e1bf0
Rework Eric's output management code given that the behavior of the UG changes drastically
...
depending on its output format. Current implementation is probably a bit overkill-ish and
we can whittle this down to what's absolutely necessary.
Writing VCFs to the 'out' protected printstream may not work at this moment.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2425 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-22 00:33:43 +00:00
ebanks
f448a263e9
The cleaner now cleans duplicate reads (instead of ignoring them) - although it doesn't include them for scoring ref or alt consenses
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2424 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 21:01:55 +00:00
ebanks
cf303810d3
VCF reader now creates the correct type of header line for each header type
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2423 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 20:39:06 +00:00
ebanks
e06dfe44c4
Check for null platform (even when the read group isn't null) and assign it the default platform if it is
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2420 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 07:01:41 +00:00
ebanks
87e5a41964
Fixed a bug that accounted for a bunch of my remaining mis-cleaned indels.
...
Also, slightly optimized the cleaner by using readBases (instead of readString) and caching cigar element lengths.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2419 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 05:46:16 +00:00
hanna
b780ffb34a
Add a getFormat() method to get the output format from the writer. The need for
...
this call suggests that I may be thinking about the typing of the GenotypeWriter object the wrong way.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2418 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 01:46:26 +00:00
hanna
11cbfcec9c
Get rid of backlink from ArgumentDefinitions to ArgumentSources. This will help in the future with multiple
...
source -> single definition mapping sets.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2417 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 00:39:36 +00:00
hanna
9e53c06328
First revision of command-line argument support for GenotypeWriter. Also, fixed the damn build.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2416 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-20 19:19:23 +00:00
ebanks
4ff61097cf
Trivial change: < -> <=
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2415 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-20 03:35:27 +00:00
ebanks
566b556b50
Give user ability to turn off max allowed interval size
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2414 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-20 03:20:22 +00:00
ebanks
a5f75cbfd4
The previous commit broke the build, so this is a temporary patch to get it to compile. ConcordanceTruthTable should use enums (esp. now that all of the concordance variables need to be public), but VariantEval will need to be rewritten soon anyways so I'll just push it off until then.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2413 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-20 02:34:41 +00:00
depristo
ee8bcdc61d
PooledConcordance calculations have been reformatted and bugs fixed. Now properly handles monomorphic sites. Also works with -G option now, correctly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2412 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-19 23:22:36 +00:00
depristo
9bf2d12c64
Misc. improvements to the LMW code. Support for emitting all sites, regardless of genotype. Min and max quality scores.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2411 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-19 23:20:57 +00:00
aaron
7e0f69dab5
Changed the GLF record to store it's contig name and position in each record instead of in the Reader. Integration tests all stay the same.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2410 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 22:54:56 +00:00
hanna
80b3eb85fa
Fixed curiously epic failure in read-backed pileup: size() mismatched the numReads-numDeletions at that locus in the case where includeReadsWithDeletionsAtLoci == false, causing failures including bad output from pileup walker. Also fixed up ValidatingPileup to run with the new ReadBackedPileup instead of just compiling successfully.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2409 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 22:52:44 +00:00
rpoplin
fdf542c214
The CycleCovariate for 454 data is now the TACG flow cycle. That is, each flow grabs all the T's, A's, C's, and G's in order in a single cycle. This is changed from incrementing the cycle whenever there is a discontinuous nucleotide along the direction of the read.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2408 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 22:39:51 +00:00
aaron
c39675d2c1
VCFTool.java got left off of the last commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2407 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 21:33:53 +00:00
ebanks
4ea31fd949
Pushed header initialization out of the GenotypeWriter constructors and into a writeHeader method, in preparation for parallelization.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2406 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 19:16:41 +00:00
ebanks
eeddf0d08e
Adding sample utils for convenience methods to pull out samples from e.g. SAMFileHeader or Genotype objects
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2405 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 18:51:21 +00:00
chartl
79b997f43d
Minor fix to getValue (thanks Ryan!)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2404 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 15:45:51 +00:00
aaron
9971a8da9a
adding a check to the RodVCF to ensure that records are in-order in the underlying VCF file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2403 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 15:24:45 +00:00
chartl
38563bbc2d
The values used to be integers (-1 for unpaired, 0 for unmapped, 1 for first, 2 for second); but i switched to strings before commit so it was more clear. Forgot to update the OTHER getValue method.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2402 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 15:05:14 +00:00
chartl
7b5e332ff3
Added - PairedQualityScoreCountsWalker: counts quality scores (e.g. as a histogram) on first reads of a pair and second reads of a pair. Turns out there's a consistent difference in quality scores; even after recalibrating without the pair ordering as a covariate (there's a bit of averaging -- but not as much as I initially thought).
...
Added - A paired read order covariate to use with recalibration. Currently experimental: for instance, what's a proper pair versus just a pair? Nobody should use this one...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2401 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 15:01:01 +00:00
ebanks
4f59bfd513
Updates to the various GenotypeWriters to make them do simple things like write records (plus allow GLFReader to close).
...
Adding first pass of stub and storage classes for the GenotypeWriters so that UG can be parallelizable. Not hooked up yet, so UG is unchanged.
The mergeInto() code in the storage class is ugly, but it's all Tribble's fault. We can clean it up later if this whole thing works.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2400 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 07:20:23 +00:00
ebanks
94f5edb68a
1. Fixed VCFGenotypeRecord bug (it needs to emit fields in the order specified by the GenotypeFormatString)
...
2. isNoCall() added to Genotype interface so that we can distinguish between ref and no calls (all we had before was isVariant())
3. Added Hardy-Weinberg annotation; still experimental - not working yet so don't use it.
4. Move 'output type' argument out of the UnifiedArgumentCollection and into the UnifiedGenotyper, in preparation for parallelization.
5. Improved some of the UG integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2398 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 04:14:14 +00:00
jmaguire
98839193b7
compatibility with VCF lib's switch to GenomeLoc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2397 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 00:52:48 +00:00
jmaguire
8787dd4c5e
Various and sundry additions to VCF tools. Some useful to the general public, some one-offs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2396 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 00:35:45 +00:00
rpoplin
6fbf77be95
Updating the two solid_recal_mode options to also change the previous base since solid aligner prefers single color mismatch alignments over true SNP alignments. COUNT_AS_MISMATCH mode has been removed completely. The default mode is now SET_Q_ZERO.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2394 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 20:07:26 +00:00
ebanks
c75ec67f84
When called as a standalone, VariantAnnotator now emits samples in sorted (as opposed to random) order in VCFs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2392 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 19:01:08 +00:00
rpoplin
aa86f3710d
Updating HomopolymerCovariate to only count the consecutive previous bases. I left in the code but commented out for if somebody wants to worry about carry forward homopolymer problems.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2391 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 18:25:09 +00:00
hanna
9143822822
Fix half-hearted attempt to try to move classes from package to package.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2389 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 17:41:42 +00:00
asivache
acb4d477da
sync...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2387 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 01:03:01 +00:00
asivache
ba86508854
remove debug print command
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2386 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 00:00:01 +00:00
asivache
d72d332239
1) changed to search specifically for D and I cigar elements (and to process properly/ignore H,S,P elements) and print out only intervals that encompass actual indels. There's still one interval per read (at most) generated, which is the smallest intervals that covers ALL indels (D or I elements) present in the read; 2) if an interval (thus the original read itself and indels in it) sticks beyond the end of the chromosome, the read is ignored and this interval is NOT printed into the output; instead, a warning is printed to STDOUT (should we send it to logger.warn() instead?
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2385 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 23:29:07 +00:00
hanna
5b78354efd
Fixed NPE in index check with RefWalkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2384 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 22:37:45 +00:00
hanna
e6127cd6c5
Temporary hack for Tim Fennell: introduce a sharding strategy that stuffs all data into a single
...
shard for cases when the index file isn't available. Works for the case in question, but is not
guaranteed to work in general. Will be replaced once the new sharding system comes online.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2383 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:55:42 +00:00
ebanks
bef1c50b3b
Some cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2382 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:41:06 +00:00
ebanks
bb92e31118
Optimizations:
...
1. push the ReadBackedPileup filtering up into the ReadFilters for read-based filters
2. stop querying the cigar for its length (just do it once)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2381 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:39:58 +00:00
andrewk
36875fca89
Update documentation in the new help system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2380 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:33:12 +00:00
hanna
ee47eb4367
Make filters used available to the walker via getToolkit().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2379 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:26:04 +00:00
ebanks
b626fc0684
Joint Estimate is now the default calculation model.
...
Reworked all of the integration tests so that they're now more comprehensive, cover more of what we wan to test, and don't take forever to run.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2376 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 19:41:02 +00:00
ebanks
e051311e8c
Added convenience methods in RodVCF to pull out all of the VCF data from the VCFRecord (e.g. getID(), getSamples(), getInfoValues())
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2374 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 17:58:41 +00:00
ebanks
bb312814a2
UG is now officially in the business of making good SNP calls (as opposed to being hyper-aggressive in its calls and expecting the end-user to filter).
...
Bad/suspicious bases/reads (high mismatch rate, low MQ, low BQ, bad mates) are now filtered out by default (and not used for the annotations either), although this can all be turned off.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2373 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 17:28:09 +00:00
aaron
af440943a4
Fixing a bug that Steven uncovered; we had an abigous contract for peek() in PushbackIterator, and SeekableRODIterator wasn't checking to see if it's PushbackIterator hasNext() was true before calling peek().
...
Changed peek() to element() to be consistant with the Java standards of the Queue and Stack classes (element() throws an exception if a record isn't available).
Also updated some of the ROD iterator next() methods to throw NoSuchElementException if next() is called when a record isn't available.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2372 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 23:04:40 +00:00
andrewk
1035abc85f
Add minimum base quality thresholding to depth of coverage via getBaseAndMappingFilteredPileup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2371 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 22:58:30 +00:00
sjia
2deae95df9
Updated documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2370 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 21:31:47 +00:00
hanna
555976d575
One more walker with formatting to fix.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2369 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 21:23:13 +00:00
hanna
cf46472419
Fix up Sherman's new docs in compliance with javadoc specs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2368 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 21:20:38 +00:00
sjia
df79ed8db1
Updated documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2367 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:53:41 +00:00
sjia
a80a5f1036
Updated documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2366 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:52:08 +00:00
sjia
18f61d2586
Updated documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2365 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:45:19 +00:00
sjia
5974c42468
Updated documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2364 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:41:35 +00:00
sjia
d8cfd707bc
Updated documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2363 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:35:18 +00:00
sjia
4322beeb35
Updated documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2362 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:33:38 +00:00
sjia
4148991d81
Now also encodes amino acids, includes documentation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2361 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:26:56 +00:00
ebanks
9b0bdbbf29
Fix for homopolymer bug: ref was lowercase, alt allele was uppercase, so alt != ref. Yuck.
...
This is a temporary fix - pushed more elegant solution over to Matt.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2360 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 19:02:23 +00:00
depristo
a810586418
Check-in without javadoc = smackdown
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2359 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 15:32:39 +00:00
ebanks
b234019cf5
Readded locus printing suppression to DoC walker
...
(and removed unused import from UG)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2358 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 14:50:56 +00:00
depristo
0d2a761460
Bugfix for minBaseQuality to ignore deletion reads. LocusMismatch walker now allows us to skip every nths eligable site
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2357 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 14:38:39 +00:00
ebanks
bf7bab754e
Made getPileupWithoutMappingQualityZeroReads() and getPileupWithoutDeletions() more efficient, per Mark's cue.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2356 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 04:35:21 +00:00
ebanks
874552ff75
Pull the genotype (and genotype quality) calculation out of the VCF code and into the Genotyper.
...
[Also, enable Mark's new UG arguments]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2355 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 04:29:28 +00:00
depristo
2cbc85cc7a
min mapping quality and min base quality arguments for UG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2354 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 03:57:27 +00:00
depristo
faa638532a
Correct location
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2353 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 02:42:21 +00:00
depristo
1da97ebb85
Walker for calculating non-independent base errors, v1. Will be moved to somewhere not in core
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2352 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 02:40:15 +00:00
chartl
b42fc905e8
Added - new tests (Hapmap was re-added)
...
Modified - Hapmap now takes a -q command to filter out variants by quality
Modified - MathUtils - cumBinomialProbLog now uses BigDecimal to handle some numerical imprecisions
Modified - PowerBelowFrequency - returns 0.0 if called with a negative number (can't be done from inside the walker itself, but since it's called elsewhere one can't be too careful)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2350 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 21:57:20 +00:00
rpoplin
8e44bfd2ef
CycleCovariate and PrimerRoundCovariate now correctly handle negative strand 454 and SOLID reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2349 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 21:52:30 +00:00
ebanks
c7b23d6ca5
Now that VCFGenotypeRecords implement SampleBacked (as they should), a quick fix was needed to get the GenotypeConcordance working when no direct samples were provided in a samples file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2348 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 04:27:16 +00:00
asivache
bd7b07f3f1
added PrimitivePair.Long and a few shortcut utility methods to PrimitivePairs: add(pair), subtract(pair), assignFrom(pair)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2347 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 00:15:44 +00:00
ebanks
97618663ef
Refactored and generalized the VCF header info code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2346 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 21:02:45 +00:00
depristo
05b8782d5f
Documentation updates. Moved CountX.java walkers to QC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2345 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 18:40:22 +00:00
depristo
92307361a4
In preparation for move
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2344 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 18:28:06 +00:00
ebanks
45199136f0
Completed my documentation responsibilities - based on Mark's reasonable assignment and not the one Matt made up while on Meth.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2342 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 04:13:30 +00:00
ebanks
bd2a46ab4c
I want to move over to hpprojects tonight, so I'm checking in various changes all in one go:
...
1. Initial code for annotating calls with the base mismatch rate within a reference window (still needs analysis).
2. Move error checking code from rodVCF to VCFRecord.
3. More improvements to SNP Genotype callset concordance.
4. Fixed some comments in Variation/Genotype
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2341 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 02:52:18 +00:00
kiran
2748eb60e1
Added short documentation for each class so that it appears in the walker command-line documentation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2340 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-12 21:41:07 +00:00
rpoplin
78e94b5a84
TableRecalibration now puts the full list of walker arguments into the PG tag of the bam file it creates. Thanks Matt and Eric. Also, the default nback for the HomopolymerCovariate is 8, down from 10.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2339 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-12 17:29:41 +00:00
rpoplin
014013630f
Added hieracrchy to the covariate classes: Required, Standard, and Experimental. Required covariates (rg and reported quality) are added for the user whether or not they are specified in the -cov list. There is now a -standard option in CountCovariates which will add in all of the standard covariates so the user doesn't have to type them all out or even know which ones are the standard. There is logger output to say which covariates are being used of course. The list of covariates used is also added to the PG tag in the bam file produced by TableRecalibration.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2338 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-12 16:34:05 +00:00
hanna
6955b5bf53
Cleanup of the doc system, and introduce Kiran's concept of a detailed summary
...
below the specific command-line arguments for the walker. Also introduced
@help.summary to override summary descriptions if required.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2337 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-12 04:04:37 +00:00
hanna
cdfe204d19
Incorporated feedback from Kiran. Use the Javadoc first sentence extraction capability to just show the first sentence from each line of Javadoc. @help.description can still be used to produce exceptionally verbose descriptions.
...
Also increased the line width as much as I could tolerate (100 characters -> 120 characters).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2336 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 21:59:55 +00:00
rpoplin
4fa4e95fbc
Updated AnalyzeCovariates to extend org.broadinstitute.sting.utils.cmdLine.CommandLineProgram and use the standard argument parsing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2335 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 21:57:18 +00:00
kiran
38d9f7b903
Renamed ReferenceContext's getSimpleBase() method to getBaseIndex()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2334 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 20:14:39 +00:00
aaron
09811b9f34
Now that we always output the VCF header, make sure that we correctly handle the situation where there are no records in the file. Added unit tests as well.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2333 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 19:51:05 +00:00
hanna
0da2105e3c
Moving DuplicateQualsWalker to oneoffprojects.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2332 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 19:22:32 +00:00
rpoplin
60c3eb4b60
Added help.description to the recalibration walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2331 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 19:02:29 +00:00
ebanks
2ea7632b76
The SNP genotype concordance module is now more comprehensive.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2330 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 18:34:33 +00:00
hanna
590aeee7d2
Documentation for more basic walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2329 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 18:15:40 +00:00
hanna
d1815f3559
More documentation for walkers that I'm familiar with in the collection of core walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2328 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 18:02:33 +00:00
hanna
956c36a2c8
Help for the qc package.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2327 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 17:32:47 +00:00
hanna
450ea233a5
Docs for the basic walkers: CountLoci, CountReads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2326 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 17:17:34 +00:00
hanna
f97ac939fa
Punch up the help documentation for CombineDuplicates.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2325 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 17:09:35 +00:00
aaron
86dc98bfb5
update the documentation for CombineDuplicates for the new help system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2324 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 17:01:42 +00:00
aaron
420725441a
documentation updates for the new help system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2323 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 16:15:44 +00:00
hanna
23d96b1d43
Help system content for the alignment module.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2322 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 16:01:25 +00:00
ebanks
2de7e1a178
Move VariantAnnotator over to use a StratifiedAlignmentContext split by sample.
...
The only major difference is that we are now able to get accurate allele balance ratios.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2321 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 05:28:28 +00:00
depristo
8f7554d44f
A few improvements to pooled concordance calcluations. Now will show you FN with the -V option. BasicGenotype now prints out a reasonable representaiton wiwth toString
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2320 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 23:09:10 +00:00
aaron
f64a4c66ac
some tweaks for the GATK paper genotyper to better work with shared memory parallelization, added documentation changes for Matt's new help system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2319 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 22:33:51 +00:00
andrewk
a7cd172628
Added 8x coverage field and minimum base quality command line option in order to be able to compare to U. Wash. exome metrics.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2318 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 22:14:44 +00:00
ebanks
2869270c11
Fixed deletion depth calculation plus mis-spelling in ReadBackedPileup method.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2315 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 21:11:42 +00:00
ebanks
31b1d60d28
Generalized the StratifiedAlignmentContext code so that it's easy to add new ways to stratify. Then added an MQ0-free stratification so we don't need to be carrying around 2 different alignment contexts (full vs. mq0-free) anymore.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2314 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 19:50:06 +00:00
hanna
0c396f04a2
Fix obvious cut/paste error in output stream management code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2313 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 19:23:13 +00:00
ebanks
11ac7885b0
Pull out StratifiedAlignmentContext code so other walkers can use it.
...
This is basically a wrapper class around AlignmentContext which allows you to stratify a context by e.g. reads on forward vs. reverse strands.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2312 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 19:21:16 +00:00
hanna
adb2fdbee7
Before, we were only checking that the reference was present if @Requires required that a reference was present. Now we always check that a reference is present, so that we get an intelligent error message.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2311 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 19:15:48 +00:00
hanna
5eac510b2f
Refactor the code I gave Eric yesterday to output command line arguments.
...
Convert it from a completely wonky solution to a slightly less wonky solution
that will work in more cases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2310 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 18:57:54 +00:00
hanna
74b8055b6a
Only show extra walker help if the user didn't specify a walker or specified
...
an invalid walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2309 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 16:43:06 +00:00
ebanks
0fae798b3a
1. Discoverable base calculations don't care about Genotypes (use Variation's PError regardless of whether the call is ref or var - it's the correct value even for ref calls).
...
2. Call a base genotypable if any of the Genotypes is above the threshold (you can't assume there's a single Genotype associated with the Variation).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2306 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 04:26:06 +00:00
ebanks
a45adadf1f
VCFGenotypeRecord already defines all the methods needed to be SampleBacked, so let's annotate it as being SampleBacked. This way, when used as a generic Genotype, sample data can be retrieved.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2305 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 04:16:21 +00:00
ebanks
78d5ac9bc2
Don't check het count when there are multiple Genotypes per Variation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2304 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 04:07:47 +00:00
ebanks
f7c44ad019
- Read in arguments for the header based on reflection
...
- Hook up Variation and Genotype in SSG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2300 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 21:35:33 +00:00
hanna
408f6f3dee
Refactoring of prior commit: better handling of unnamed package within the help system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2297 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 20:12:35 +00:00
hanna
1d2151adcf
Better handling of nulls output by
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2296 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 19:34:56 +00:00
ebanks
40c2d7a4bc
Fix all-bases-mode and genotype-mode in the UG and add integration tests for them.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2295 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 17:41:30 +00:00
ebanks
4e54b91ce4
UG now outputs the FORMAT header fields when there's genotype data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2294 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 16:31:07 +00:00
rpoplin
12c49ea485
Added DuplicateReadFilter to filter out reads that are marked as duplicates.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2293 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 15:42:53 +00:00
ebanks
fb900b12e1
VariantFiltration now details the filters it has used in the header of the VCF it produces.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2292 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 15:36:15 +00:00
ebanks
7a76e13459
Better explanation in the exception being thrown.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2291 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 03:59:36 +00:00
ebanks
8d67d9ade3
-Minor fix in UG for all-bases mode
...
-Make minConfidenceScore in VariantEval a double so non-integer values can be used (requested by Steve H).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2290 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 03:49:10 +00:00
ebanks
717eb1de96
- Depth annotation now includes MQ0 reads
...
- Removed MQ0 annotation
- Updated RMS MQ annotation to use new pileup
- UG now outputs all of its arguments as key/value pairs in the header (for VCF)
- Cleaned up VCFGenotypeWriterAdapter interface a bit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2288 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 02:53:00 +00:00
ebanks
e8822a3fb4
Stage 3 of Variation refactoring:
...
We are now VCF3.3 compliant.
(Only a few more stages left. Sigh.)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2287 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-08 21:43:28 +00:00
hanna
9e2f831206
A bit of cleanup in preparation for Picard patch.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2286 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-08 16:09:04 +00:00
hanna
d3b78338da
Get rid of characters in the docs that aren't universally compatible with
...
character sets used throughout the group.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2285 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 21:41:07 +00:00
hanna
d75d3a361a
Clean up some of the walker help output based on additional experience and
...
feedback received. Also, add a flag to build.xml to disable generation of
docs on demand (use ant -Ddisable.doc=true to disable docs).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2284 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 21:33:11 +00:00
hanna
a3e88c0b1c
Cleanup results of bad merge.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2281 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 19:30:49 +00:00
hanna
10be5a5de9
Move some files around to reflect our growing help infrastructure.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2280 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 19:23:12 +00:00
rpoplin
1d5b9883db
Added --solid_recal_mode argument to experiment with different ways of dealing with solid reference bias. Currently the default option is DO_NOTHING which means use the same behavior as the old recalibrator. Eventually the new methods in RecalDataManager will be moved over to a SolidUtils class. Added transition and transversion methods to BaseUtils that work like simpleComplement, used with the color space in my solid methods. Also, initial check-in of HomopolymerCovariate.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2276 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 14:26:27 +00:00
depristo
8f461d3c40
Critical bug fix for VariantEval dbSNP calculations. Moved the system over to the new improved ROD iterators, resulting in dbSNP rates jumping 5% or so, due to masking of true SNPs by preceding indels.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2274 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 03:36:38 +00:00
hanna
8089aa3c50
Adding support to override the help text.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2273 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 00:16:26 +00:00
ebanks
c0528cd88e
Updated the CallsetConcordance classes to use new VCF Variation code... and uncovered a whole bunch of VCF bugs in the process. I'm not convinced that I got them all, so I'll unit test like crazy when the refactoring is done.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2272 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-06 11:43:40 +00:00
ebanks
b6f8e33f4c
Stage 2 of Variation refactoring:
...
VCFRecord now implements Variation, VCFGenotypeRecord now implements Genotype.
Because of this change, RodVCF is now just a wrapper around the VCFRecord and does nothing else. Also, one can call toVariation on the VCFGenotypeRecord and it returns the VCFRecord.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2271 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-06 06:48:03 +00:00
hanna
3b440e0dbc
Add a taglet to allow users to override the display name in command-line help.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2270 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-06 04:12:10 +00:00
ebanks
08f2214f14
Stage 1 of massive Variation/Genotype refactoring.
...
This stage consists only of the code originating in the Genotyper and flowing through to the genotype writers. I haven't finished refactoring the writers and haven't even touched the readers at all.
The major changes here are that
1. Variations which are BackedByGenotypes are now correctly associated with those Genotypes
2. Genotypes which have an associated Variation can actually be associated with it (and then return it when toVariation() is called).
The only integration tests which need to be updated are MSG-related (because the refactoring now made it easy for me to prevent MSG from emitting tri-allelic sites).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2269 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-06 03:12:41 +00:00
hanna
b04de77952
First pass at a reorganized walker info display. Groups walkers by package
...
and displays walker data extracted from the JavaDoc. Needs a bit of help,
both in content and flexibility of package naming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2267 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 23:24:29 +00:00
depristo
07b88621c5
Improved RankSum calculations and RankSum annotation. Much more meaningful
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2266 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 22:16:40 +00:00
hanna
4c147329a9
Turn javadoc comments for packages and classes into key/value pairs in a properties file. Embed the properties file
...
in GenomeAnalysisTK.jar. Still no support for actually displaying the archived javadoc. Also change the approach
to providing package javadocs: retired the deprecated package.html file in favor of Java1.5-style package-info.java.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2263 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 20:08:41 +00:00
ebanks
1e8dcc30da
-dbSNP rod should not implement VariantBackedByGenotype since dbsnp records have no genotype data
...
-added code to cache the allele list so it didn't need to get recomputed each time it was requested.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2260 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 14:56:48 +00:00
ebanks
58937bf9ba
You can now use the -exp flag to tell the Genotyper to include experimental annotations when it calls out to VariantAnnotator.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2256 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 04:45:05 +00:00
ebanks
b05e73a914
Finished implementation of the Wilcoxon Rank Sum Test thanks to Tim Fennell (calculating the normal approximation) and Nick Patterson (dithering to break tie bands).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2255 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 04:04:39 +00:00
ebanks
861221d046
- Moved various header line printing into a single method
...
- Fixed output for coverage above min depth
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2254 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 02:15:43 +00:00
ebanks
aef4be5610
Moved CoarseCoverageWalker to core and packaged both coverage walkers in coverage/
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2249 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 17:53:36 +00:00
ebanks
df4e001a07
Renamed to more accurately describe its function.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2248 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 17:34:49 +00:00
ebanks
c2017cc91b
PrintCoverageWalker functionality moved to DepthOfCoverageWalker. Added integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2247 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 17:23:59 +00:00
ebanks
01cf5cc741
1. Merged CoverageHistogram into DepthOfCoverageWalker
...
2. Fixed bug in histogram calculation for small intervals
3. Better output in DoCWalker
4. Comments added to code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2245 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 17:01:53 +00:00
ebanks
44b9f60735
PercentOfBasesCovered functionality moved to DepthOfCoverageWalker. Added integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2244 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 16:11:09 +00:00
ebanks
126d1eca35
Move to core (qc/)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2243 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 15:45:58 +00:00
ebanks
9da5cc25ad
More archiving (with permission from Andrey) plus a move to core.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2242 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 15:40:27 +00:00
aaron
b3bdcd0e60
make sure we close the error log stream in CommandLineProgram if it's opened; unit tests and clean-up for BasicVariation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2241 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 06:59:27 +00:00
ebanks
a88202c3f6
Refactored DoCWalker to output in a more helpful and usable style. It now outputs in tabular format with 2 different sections: per locus and then per interval.
...
I am now at a point where I can merge the functionality from other coverage walkers into this one.
Thanks to Andrew for input.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2239 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 05:28:21 +00:00
ebanks
d7e4cd4c82
Moving some useful and stable walkers to core:
...
- ClipReads
- PrintRODs (generalized to print all RODs that are Variations)
- FixBAMSortOrderTag (added documentation to walker so that people know what it does and why)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2238 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 03:00:45 +00:00
rpoplin
46f3d3e39b
Added comments to AnalyzeCovariates and R scripts. R script prevents residuals from going off the edge of the plot. Added skeleton code to the recalibration walkers showing how we plan to handle SOLID reference inserting behavior.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2233 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 23:15:52 +00:00
depristo
c776f9fb90
Simple utilities for dealing with Complete Genomics data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2230 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 22:51:41 +00:00
ebanks
a09fee2b5e
Moved some more walkers to oneoffprojects and killed an old indel-related walker that isn't being used.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2228 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 20:28:07 +00:00
depristo
dec0a781c2
Un-reinventing the wheel. --sleep argument removed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2227 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 20:19:28 +00:00
ebanks
a3343c75db
Move and rename a hybrid-selection-specific coverage calculation to hybridselection/
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2225 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 20:11:22 +00:00
ebanks
2c83f2f2bc
Move MSG - plus now obsolete classes which it depends on -- to oneoffprojects (with permission from Jared).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2224 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 20:04:22 +00:00
chartl
6a9e7bea05
Removing experimental annotations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2220 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 19:03:55 +00:00
jmaguire
c180a76b05
Added option "append": if set, and the specified discovery output already exists, don't re-call anything that's already present in that file. Append new calls to it.
...
Great for resuming long jobs that died partway through.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2219 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 18:56:19 +00:00
ebanks
0a2304eff8
- Rename minConfidenceScore in VariantEval to minPhredConfidenceScore
...
- Moved validation walkers to new qc dir
- Killed unused test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2218 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 17:59:19 +00:00
ebanks
a5dfc9107d
- Cleaned up annotation code some more
...
- Use QualityUtils when phred-scaling now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2217 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 17:45:29 +00:00
ebanks
7055a3ea2d
- All annotations are now required to return their VCF INFO keys and descriptions
...
- Renamed keys to fit with the standard naming
- FisherStrand is no longer standard
- Integration tests no longer test experimental annotations since they're not stable
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2216 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 17:24:06 +00:00
rpoplin
67179e2412
Initial checkin of AnalyzeCovariates.java which replaces analyzeRecalQuals_1KG.py and is updated to use the new Covariates system. It creates similar plots of residual error for each covariate that was used in the calculation. There is also an option to filter out base qualities below a given threshold.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2215 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 16:47:35 +00:00
ebanks
2838629724
-VCF writer now checks whether the allele frequency has been set before trying to write it out.
...
-Renamed methods to be more consistent.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2214 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 16:25:32 +00:00
depristo
6231637615
fixes for VariantAnnotations and second bases. Misc. removal of failing (and unstable) integration tests that require rereview
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2213 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 15:41:35 +00:00
aaron
d487428468
remove incorrect parentheses
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2211 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 06:46:32 +00:00
ebanks
b979bd2ced
- Optimized implementation of -byReadGroup in DoCWalker
...
- Added implementation of -bySample in DoCWalker
- Removed CoverageBySample and added a watered down version to the examples directory
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2209 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 03:39:24 +00:00
ebanks
7c73496e72
Moved DoC walker over to new pileup system so it no longer moves like it's stuck in molasses.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2208 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 02:46:39 +00:00
ebanks
ba8a8febc6
Thanks to Steve Hershman for finding this bug:
...
getNegLog10PError() does not equal the confidence score (you need to multiply by 10 as confidence is traditionally phred scaled). Probably we should change the method to be getNeg10Log10PError(). Anyone have strong feelings on this?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2207 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 01:59:03 +00:00
ebanks
3303808a8f
Yet more walkers moved to oneoffprojects.
...
Made hybridselection subdir in playground.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2205 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 21:29:12 +00:00
ebanks
05923f7fba
Started transition to oneoffprojects.
...
Moved/killed a few other walkers (with permission).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2204 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 21:19:02 +00:00
ebanks
c36069355e
Trivial change to verbose
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2203 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 20:48:10 +00:00
jmaguire
74f6526e09
VCFHomogenizer: A class that extends InputStream and dynamically re-writes pilot1 VCF's to be on-spec.
...
VCFTool: A command-line tool with various useful VCF functions (validate, grep, concordance).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2202 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 17:55:42 +00:00
jmaguire
adf8f1f8b3
Add an InputStream constructor, which is immensely useful for various reasons.
...
Also a minor performance optimization.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2201 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 17:25:00 +00:00
ebanks
e581cceab6
Got Kris's permission to delete these walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2200 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 16:57:28 +00:00
rpoplin
3180fffd43
Eliminated unnecessary boxing of longs in RecalDatum. Changes to RecalDatum in preparation for new AnalyzeCovariates script. Updated TableRecalibrationWalker to make use of these changes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2199 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 16:49:05 +00:00
chartl
21a9a717e4
Some minor changes and test:
...
- DepthOfCoverage is now by reference (so locus-by-locus output correctly reports zero-coverage bases)
- VariantsToVCF now lets you bind variants with any string except intervals and dbsnp (not just NA######)
- A PileupWalker integration test on a particularly nasty FHS site
- Two second-base annotation related integration tests on that same site
+ outputs were all hand-validated in matlab; within a certain tolerance for the annotations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2197 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 15:15:54 +00:00
ebanks
084337087e
Removing deprecated code and walkers for which I had the green light from repository.
...
Moved piecemealannotator and secondarybases to archive.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2195 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 05:58:20 +00:00
ebanks
2c16c18a04
Move Andrey's old indel code (plus MSG accuracy test, which depends on it) to archive.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2194 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 05:29:00 +00:00
ebanks
7c6c490652
An unfinished implementation of the Wilcoxon rank sum test and a variant annotation that uses it. I need to merge and update this code with Tim's implementation somehow - but that won't happen until later this week, so I'm committing this before I accidentally blow it away.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2193 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 04:56:17 +00:00
ebanks
00f15ea909
Improved performance of deletion-free pileup and added mapping-quality-zero-free pileup convenience method.
...
Finished converting genotyper and annotator code to new ReadBackedPileup system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2192 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 04:50:47 +00:00
rpoplin
6bb864da2a
More misc cleanup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2191 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 22:29:07 +00:00
rpoplin
b89b9adb2c
misc code cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2190 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 21:16:00 +00:00
depristo
e793e62fc9
minor code cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2189 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 20:57:20 +00:00
rpoplin
4969cb1957
CountCovariates uses new optimized ReadBackedPileup. It also smarter about re-doing calculations for the dnsnp variation rate sanity check.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2188 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 20:35:40 +00:00
ebanks
add2fa7ab4
more use of new ReadBackedPileup optimizations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2187 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 20:04:01 +00:00
rpoplin
817e2cb8c5
Recalibrator makes use of the new GATKSAMRecord wrapper and now no longer has to hash the SAMRecord. Covariate's getValue method signature has changed to take the SAMRecord instead of the ReadHashDatum. ReadHashDatum removed completely.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2185 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 19:59:17 +00:00
ebanks
e9a8156cfb
Use new optimized ReadBackedPileup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2184 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 18:17:18 +00:00
rpoplin
d8146ab23d
Changed the format of the recalibration csv file slightly so that it is easier to load the file into something like R and look at the values of the covariates.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2183 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 17:55:23 +00:00
ebanks
a184d28ce9
Completing the optimization started by Matt: we now wrap SAMRecords and SAMReadGroupRecords with our own versions which cache oft-used variables (e.g. platform, readString, strand flag). All walkers automagically get this speedup since the wrapping occurs in the engine.
...
I note that all integration/unit tests pass except for BaseTransitionTableCalculatorJava, which is already broken.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2182 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 17:39:29 +00:00
depristo
af22ca1b47
Bug fixes for VariantEval. dbCoverage now reports dbSNP rate, not some wierd eval_snps_in_db as before. We now separate non-indel and non-snp db sites in dbcoverage. Some dbSNP records don't fit into these two categories. Also fixed a consistency issue where novel / known sites where being determined solely by whether dbSNP had a record there, rather than the stricter dbcoverage screen for isSNP().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2180 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 01:39:01 +00:00
chartl
27651d8dc2
Oops. numReads is now called size
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2175 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-29 06:59:17 +00:00
chartl
21744e024b
Quick walker that determines % of bases covered at (user - defined depth)x . I've been maintaining it in my directories alone, but now that i've accidentally deleted it twice, into playground it goes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2174 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-29 06:51:19 +00:00
hanna
3300ca906a
An iterator for Eric to use when injecting his new wrapping reads -- a stopgap solution for getting additional caching
...
functionality into a SAMRecord.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2173 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-27 22:25:52 +00:00
rpoplin
26db15be5c
Added SingleReadGroupFilter to only use reads from a specific read group, filtering out all others.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2172 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-27 20:33:59 +00:00
rpoplin
91f5672a32
misc cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2171 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-27 19:56:20 +00:00
rpoplin
d1298dda13
Encapsulated the sections of code that were shared by the two Recalibration walkers. This includes both the shared command line arguments and the section of code in the map methods which pull out data from the SAMRecord and stuff it into the ReadHashDatum. Command line arguments are now passed to the Covariates using a new initialize method that all Covariates must implement. Updated the dbsnp sanity check warning message to be less cryptic.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2170 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-27 19:54:10 +00:00
depristo
75b61a3663
Updated, optimized REadBackedPileup. Updated test that was breaking the build -- it created a pileup from reads without bases...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2169 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 23:30:39 +00:00
alecw
ac1b289d55
Add tile to ReadHashDatum, and implement TileCovariate
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2166 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 21:41:42 +00:00
depristo
db40e28e54
ReadBackedPileup in all its glory. Documented, aligned with the output of LocusIteratorByState, and caching common outputs for performance
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2165 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 20:54:44 +00:00
rpoplin
b44363d20a
Removed silly casts from Integer to int.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2164 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 19:59:21 +00:00
ebanks
d0f673f0c0
Use Math.abs so we don't get (inconsistent) -0's
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2160 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 19:08:34 +00:00
rpoplin
6ff8526592
Added arguments to the recalibration walkers so the user can specify the default read group id and platform to use when a read has no read group. There are also options to force every read group and every platform to be the specified values. Added integration tests that use a bam file with no read groups. Added comments to all the covariates to explain what each of the methods in the Covariate interface are used for.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2157 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 15:41:12 +00:00
aaron
cfbd9332b0
small cleanups for the GATK paper genotyper; switched to the managed output system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2156 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 08:04:13 +00:00
ebanks
e1e5b35b19
Don't have the spanning deletions argument be a hard cutoff, but instead be a percentage of the reads in the pileup. Default is now 5% of reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2155 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 04:54:44 +00:00
depristo
03342c1fdd
Restructuring and interface change to ReadBackedPileup. We now lower support the Pileup interface, the BasicPileup static methods, and the ReadBackedPileup class. Now everything is a ReadBackedPileup and all methods to manipulate pileups are off of it. Also provides the recommended iterable() interface of pileup elements so you can use the syntax for (PileupElement p : pileup) and access directly from p.getBase() and p.getQual() and p.getSecondBase(). Only a few straggler walkers use the old style interface -- but those walkers will be retired soon. Documentation coming in the AM. Please everyone use the new syntax, it's safer, and will be more efficient as soon as the LocusIteratorByState directly emits the ReadBackedPileup for the Alignment context, as opposed to the current interface. In the process of the change over, discovered several bugs in the second-best base code due to things getting out of sync, but these changes were resolved manually. All other integrationtests passed without modification.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2154 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 03:51:41 +00:00
ebanks
2cb3e53b0b
Verbose mode shouldn't be printing out 'NaN's and 'Infinity's
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2153 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 22:01:00 +00:00
rpoplin
c9ff5f209c
Added a CountCovariates integration test that uses a vcf file as the list of variant sites to skip over instead of the usual dbSNP rod.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2152 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 21:51:38 +00:00
ebanks
3484f652e7
1. Variation is now passed to VariantAnnotator along with the List of Genotypes so non-genotype calls has access to all relevant info.
...
2. Killed OnOffGenoype
3. SpanningDeletions is now SpanningDeletionFraction
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2151 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 21:47:20 +00:00
ebanks
e05cb346f3
GenotypeLocusData now extends Variation.
...
Also, Variations should be INSERTIONs or DELETIONs (and not just INDELs).
Technically, VCF records can be indels now.
More changes coming
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2150 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 21:07:55 +00:00
rpoplin
8b30279edc
style update
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2149 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 20:56:31 +00:00
rpoplin
dffa46b380
BAM files created by TableRecalibration now have the version number and list of covariates used appended to their header with a new 'PG' tag. Eventually the entire list of command line args will be put in there as well. Big thanks to Matt and Aaron. The integration test uses the --no_pg_tag so that the md5 doesn't change every time the version number changes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2148 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 20:53:57 +00:00
aaron
8fbc0c8473
fix for bug GSA-234: fasta index files couldn't handle anything but letters, numbers, or spaces in the contig name
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2147 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 19:19:47 +00:00
andrewk
3fca23cd16
Added a stub treeReduce function for debugging multi-threaded execution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2146 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 18:51:19 +00:00
rpoplin
277e6d6b32
Further optimizations of TableRecalibration. This completes my goal of having the only math done in the map function be addition, subtraction and rounding the quality score to an integer. Everything else has been moved to the initialize method and only done once.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2145 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 18:21:57 +00:00
andrewk
e4546f802c
Accumulates coverage across hybrid selection bait intervals to assess effect of bait adjacency. Requires input bait intervals that have an overhang beyond the actual bait interval to capture coverage data at these points. Outputs R parseable file that has all data in lists and then does some basic plotting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2144 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 18:12:34 +00:00
andrewk
e5106c9924
Hybrid selection performance statistics now include counts of the number of adjacent baits (0,1,2) using OverlapDetector and optionally include assayed bait quantities input via interval lists.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2143 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 18:07:23 +00:00
ebanks
87c1860398
I'm not sure I believe it, but JProfiler claims that calling FourBaseProbs.isVerbose() was taking 5% of my runtime...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2142 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 17:00:32 +00:00
ebanks
b3f561710f
Optimizations:
...
1. Only do calculations in UG for alternate allele with highest sum of quality scores (note that this also constitutes a bug fix for a precision problem we were having).
2. Avoid using Strings in DiploidGenotype when we can (it was taking 1.5% of my compute according to JProfiler)
UG now runs in half the time for JOINT_ESTIMATE model.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2141 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 16:27:39 +00:00
rpoplin
a59e5b5e1a
Added dbSNP sanity check to CountCovariates. If the mismatch rate is too low at dbSNP sites it warns the user that the dbSNP file is suspicious. Added option in CountCovariates and TableRecalibration to ignore read group id's and collapse them together. Also, If the read group is null the walkers no long crash with NullPointerException but instead warn the user the read group and platform are defaulting to some values. Default window size in MinimumNQSCovariate is 5 (two bases in either direction) based on rereading of Chris's analysis.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2140 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 16:16:44 +00:00
alecw
e5e6d515c3
Fix misunderstanding of GenomeLoc interval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2138 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 15:12:49 +00:00
ebanks
cb6d6f2686
Very minor performance improvements
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2137 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 05:21:07 +00:00
ebanks
c90bea39a1
read.getReadString().charAt(offset) --> read.getReadBases()[offset]
...
[As a courtesy I fixed all instances once I was updating GenotypeLikelihoods]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2136 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 04:25:19 +00:00
ebanks
ec321abd7b
Added ability to filter on the QUAL field
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2135 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 04:08:22 +00:00
ebanks
36d493e645
All standard annotations now inherit from StandardVariantAnnotation. Users can specify whether they want all annotations, just the standard annotations, or specific annotations. When calling in from another walker, the default is just the standard ones.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2134 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 03:55:12 +00:00
ebanks
ee5093d2c6
-Added VariantFiltration integration tests
...
-Added integration test for GLFs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2133 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 02:36:27 +00:00
ebanks
be6a549e7b
Added the capability to allow expressions in an integration test command (i.e. -filter 'foo') by escaping them in the command.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2132 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 02:34:48 +00:00
hanna
4837fe919c
Convenience changes. If no -BWT option is specified, pull the BWT location from the reference.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2130 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 22:46:05 +00:00
rpoplin
9e4eadc37c
CountCovariates v2.0.2: Added a --process_nth_locus <int> argument to only use every Nth covered locus when creating the recalibration table.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2129 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 22:07:38 +00:00
ebanks
ed4cf3de57
Check that we're biallelic before calling isSNP()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2127 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 20:20:48 +00:00
rpoplin
5744a1d968
The covariates don't care about SAMRecord's anymore - Cleaning up the import statements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2126 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 20:10:12 +00:00
chartl
23983b2fd8
New annotation: ResidualQuality
...
Computes a metric for how much error is left that isn't explained by ref or snp bases. This is the sum of Q scores, weighted by the proportion of non-ref non-snp bases to non-snp bases. Reported in Log space.
Update to the integration test so bamboo doesn't look as though someone murdered it with a spork
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2124 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 20:04:01 +00:00
ebanks
70059a0fc9
Refactored joint estimation model to allow subclasses to overload PofD calculation over all frequencies. Pooled model now takes only 20% of time that it used to.
...
Added integration test for pooled model and updated other joint estimation tests to be more comprehensive now that they are faster.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2123 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 20:03:38 +00:00
rpoplin
7f947f6b60
Updated recalibrator integration tests to use all three platforms as well as a bam with multi-platform reads intermingled. CountCovariates v2.0.1: Once again uses a read filter to filter out zero mapping quality reads. Added --sorted_output option to output the table recalibration file in sorted order
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2122 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 19:51:36 +00:00
ebanks
14bf6ce83c
1. Newest version of the joint estimation model. Faster than previous version and now qscores can get to be > 39.8 for hets.
...
2. More sanity checks in annotations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2119 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 17:05:50 +00:00
hanna
ee2abd30c4
Count the best alignments and emit them to a file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2118 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 16:37:59 +00:00
rpoplin
1d46de6d34
The old recalibrator is replaced with the refactored recalibrator. Added a version message to the logger output. These walkers start at version 2.0.0
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2117 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 14:58:33 +00:00
ebanks
dfe7d69471
1. VCF: don't print slod if it's never set
...
2. UG: don't print slod if lods are infinite (todo: figure out a good guess instead)
3. UG: if probF=0 for 2 alt alleles are both 0 (because of precision), use log values to discriminate
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2116 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 02:55:43 +00:00
ebanks
753cb100a3
Add checks for weird situations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2115 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 02:14:25 +00:00
ebanks
04d6ac940c
Always print out VCF header - not just when there is genotype data present.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2114 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 01:44:10 +00:00
ebanks
bf935a6ab1
1. Fixed bug in PrimaryBaseSecondaryBaseSymmetry code (not checking for null before trying to access object's methods) which was causing Integration Tests to fail.
...
2. Retired allele frequency range from UG, which wasn't very useful.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2113 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 01:31:48 +00:00
rpoplin
b24240664f
Reduced the number of calls to new ArrayList() in TableRecalibration. This results in a speed up of perhaps up to 6 percent (timed trials are hard).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2112 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-22 17:24:31 +00:00
hanna
c9c4999354
BWA: odds and ends. Get rid of some spurious debug code that was accidentally
...
checked in. Add a better way to write out unmapped reads (thanks Kiran!) Add
a pre-built version of the shared library to the repository for early adoption.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2111 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-22 15:26:07 +00:00
depristo
9c206abb97
removing unnecessary printing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2110 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-22 12:41:48 +00:00
chartl
59416ae06a
This is an annotation adapted from one that Mark Daly suggested some time ago. Right now it calculates:
...
- For all reference bases, the proportion of their second best bases that support the SNP
- the proportion of non-reference bases that support the SNP
and reports the difference between the two. Initially I was taking depth into account as well, but that did not appear to work as nicely as I'd like (even at 20,000x depth, if 95% of the non-reference bases are C, and 98% of the reference second-best-bases are C, then we would want to be suspicious of it; but perhaps slightly less so than if the depth were only 20...)
Anyway it's now available. I'm not sure how useful it will be, but I spawned the FHS annotation jobs again, so we'll see.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2109 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-22 00:47:49 +00:00
rpoplin
98f921fe24
The refactored CountCovariates now hashes the read object into a HashMap which holds all the properties the covariates pull out of the read over and over again such as read group string, bases string and its complement string, quality scores, etc. This results in a big speed up. CountCovariatesRefactored is now just slightly slower than CountCovariates (perhaps 1.07x according to my latest time trial). Thanks to Alec for suggesting IdentityHashMap. CycleCovariate now warns the user that is is defaulting to the Solexa definition of cycle when the platform string pulled out of the read is unrecognized instead of halting with an Exception.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2108 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-21 20:38:17 +00:00
depristo
27122f7f97
Performance improvements for pooled caller. Now possible to actually run on real data in a finite amount of time. Minor changes to GL interface (making strandIndex public) to support cached calculations in pooled caller.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2107 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-21 15:07:40 +00:00
ebanks
797bb83209
New VariantFiltration.
...
Wiki docs are updated.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2105 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 19:50:26 +00:00
hanna
a78bc60c0f
Minor tweak to improve ease-of-use of iterator system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2104 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 18:24:19 +00:00
hanna
4fbb6d05d0
Refactoring. Push the revisions to the common aligner interface down into
...
the aligner base classes. Hack the managed implementation to support the
new interface.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2103 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 17:08:09 +00:00
ebanks
d84444200b
The Unified Genotyper now sorts the sample names in the vcf that it outputs.
...
[There was no reason to enforce that every VCF being output from the GATK should have the samples sorted, since someone might want them ordered non-alphabetically]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2102 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 16:13:18 +00:00
hanna
38a030f2ba
Finishing off data transfer conduits for single alignment generator.
...
Misc bug fixes elsewhere.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2101 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 15:21:59 +00:00
ebanks
2a5349d886
VariantAnnotator now adds dbsnp id if a dbsnp rod is supplied and it's not already set for a record
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2100 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 03:26:09 +00:00
ebanks
b434c1c240
Check for null entries before adding
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2099 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 03:12:20 +00:00
depristo
82fd824c4d
Continuing improvements to unified genotyper
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2098 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 01:39:29 +00:00
aaron
33dcfc858d
updates to the paper genotyper based on Mark's comments. There's still more work to do, including more testing.
...
Also a 250% improvement in the getBases() and getQuals() of BasicPileup, which was nearly all of the runtime for the genotyper (using primitives instead of objects when possible).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2097 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 23:06:49 +00:00
rpoplin
22aaf8c5e0
Added the old recalibrator integration tests to the refactored recalibrator sitting in playground.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2096 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 22:43:28 +00:00
hanna
a95302fe98
Single alignment generator, another checkpoint. Does generate single alignments, but some of the data still
...
needs to plumbed through and it may leak memory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2095 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 21:20:03 +00:00
hanna
a972b2769f
Checkpoint. Add first phase of single alignment interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2094 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 19:03:43 +00:00
aaron
6ba1f3321d
Fixed the sample mix-up bug Kiran discovered, and added a unit test in the VCF reader class (Thanks for the good example files Kiran). Also renamed the toStringRepresentation function to toStringEncoding, and added a matching method in VCFGenotypeRecord.
...
Updated the integration tests that were failing to due to different ordering of genotyping entries in VCF, I'll check in the VCF diff tool I wrote when I get a cycle or two.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2092 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 18:17:47 +00:00
chartl
b4babb82eb
adding an extra bit of data to come out of CTT (number of chips with actual data)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2091 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 17:46:10 +00:00
alecw
7623b39927
Add rodPicardDbSNP
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2088 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 17:27:46 +00:00
alecw
b2b4ff7eca
Cache SAMReadGroup rather than get it twice
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2087 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 17:27:18 +00:00
depristo
eeb3a3fffb
comments for Aaron
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2081 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 12:56:04 +00:00
aaron
7997455f38
first go of the genotyper for the GATK paper. More testing and review tomorrow to call it done.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2080 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 07:55:24 +00:00
ebanks
7b957d3e2e
Make the whining from Khalid's office stop already
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2079 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 03:04:48 +00:00
hanna
85bc9d3e91
(Hopefully) temporary hack: load contig information by contig name rather than contig id to avoid
...
off-by-one errors.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2078 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 23:33:27 +00:00
rpoplin
0fbd81766b
CountCovariates now uses any rod of type VariationRod with the name dbsnp as the source of known variant sites to skip over. It also grabs the platform string out of the read group when deciding which algorithm to use to calculate machine cycle. In this way it can now handle multi-platform bams. I added a new covariate: PositionCovariate. This is simply the offset regardless of which platform the read came from. This will be useful for comparing between the two covariates. Finally, this message serves as a warning that I will be killing the old recalibrator tomorrow after I've updated and verified new integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2077 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 23:03:47 +00:00
ebanks
f667bed7fc
-Don't annotate allele balance or on-off genotype if there's no genotype data
...
-If qscore is infinity (because of precision) make a best guess instead
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2076 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 22:01:32 +00:00
ebanks
087e01a439
minor changes for --noSLOD
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2074 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 18:48:01 +00:00
ebanks
a70cf2b763
A bunch of changes needed to make outputting pooled calls possible
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2073 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 18:42:57 +00:00
ebanks
0a35c8e0ba
1. The joint estimation model now constrains genotypes to be AA,AB,or BB only (i.e. to use a single alternate allele). Note that this doesn't work for the old models (point estimate or SSG) because calculations aren't divided by alternate allele.
...
2. Allele frequency spectrum is not emitted for single samples (since it doesn't make sense).
3. If in pooled mode, throw an exception of pool size isn't set appropriately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2072 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 17:43:15 +00:00
chartl
405c6bf2c1
VariantEval genotype concordance for pools! Integration test coming soon
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2071 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 17:24:54 +00:00
depristo
6fe1c337ff
Pileup cleanup; pooled caller v1
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2070 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 17:03:48 +00:00
rpoplin
f0a234ab29
TableRecalibration is now much smarter about hashing calculations, taking advantage of the sequential recalibration formulation. Instead of hashing RecalDatums it hashes the empirical quality score itself. This cuts the runtime by 20 percent. TableRecalibration also now skips over reads with zero mapping quality (outputs them to the new bam but doesn't touch their base quality scores).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2069 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 16:47:44 +00:00
chartl
be31d7f4cc
Added - a walker that outputs relevant information about false negatives given a bunch of hapmap individuals and corresponding integration tests for it.
...
This will output for hapmap variant sites:
chromosome position ref allele variant allele number of variant alleles of the individuals depth of coverage power to detect singletons at lod 3 number of variant bases seen whether or not variant was called
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2068 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 15:47:52 +00:00
chartl
b68d6e06b7
Rollback of the previous "fix" and implementation of the real fix.
...
We totally *do* want to annotate the call if called by another walker. Totally boneheaded misenterpretation of what the code was doing -- Eric, please forgive me for being an idiot.
Instead, change the StingException to what it really should be -- an IllegalStateException, which is not coincidentally already handled by the calling function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2067 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 06:09:24 +00:00
chartl
95f1be94c0
Fix for the broken build:
...
do **not** attempt to annotate if UnifiedGenotyper is called from another walker! Why this didn't break the build earlier I have no idea.
Ultimately, there should be a better way of interfacing UG with another walker -- what if some other walker wants the annotations from UG? But since we're calling map directly -- and the annotations don't get returned directly from map -- this needs to be handled differently, while the map function should ultimately return the LOD score or quality under the GCM alone.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2066 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 05:56:31 +00:00
ebanks
9fb50e9bd9
Further refactoring so that pooled calling will work.
...
Okay, Mark, you should be all set.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2065 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 00:18:13 +00:00
chartl
539f6f15e5
Added --
...
Second base skew annotations and integration tests. Nothing need be given except -A SecondBaseSkew; the statistic it annotates calls with is a chi-square statistic given by the deviation of the observed proportion of reference second-best-bases from the expected 1/3. Future additions may be to ask that the deviation be instead from a given transition table.
A big note for all users: All IllegalStateExceptions from the variation ROD (e.g. the RodGeliText) are dealt with SILENTLY. I understand this isn't optimal, but I'd rather simply not annotate a non-bi-allelic site than fail completely (there are quite a few such sites even on the regions over which the integration test has been written).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2064 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 00:11:13 +00:00
depristo
42a0bbaf46
Minor reformating for pooled calling
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2063 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 22:06:11 +00:00
rpoplin
ec1a870905
Working with byte arrays is faster than working with Strings so the Covariates now take in byte arrays. None of the Covariates themselves used the reference base so I removed it. DinucCovariate now returns a Dinuc object which implements Comparable instead of returning a String because it was too slow. CountCovariates now uses a read filter to filter out unmapped reads and allows the user to specify -cov all which will use all of the available covariates, of which there are 7 now. If no covariates are specified it defaults to ReadGroup and QualityScore, the two required covariates. Initial code in place to leave SOLID bases alone if they have bad color space quality. TableRecalibration uses @Requires to tell the GATK to not give the reference bases since they weren't being used for anything.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2062 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 21:50:52 +00:00
ebanks
4d9c826766
Integration tests actually run on real data now.
...
<tries to hide sheepish grin>
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2061 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 21:04:14 +00:00
ebanks
a048f5cdf1
-Refactored JointEstimation code so that pooled calling will work
...
-Use phred-scale for fisher strand test
-Use only 2N allele frequency estimation points
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2059 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 20:21:15 +00:00
chartl
43bd4c8e8f
Ignoring deletions in the primary pileup by default was causing the primary pileup to become shorter than the secondary pileup when building up the secondary base pileup string. This fix makes sure to include the primary Ds within the pileup so that not only are the pileups guaranteed to be the same size, the same offsets will truly correspond with the same read.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2058 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 17:20:13 +00:00
aaron
aece7fa4c7
a convenience method to join a map into a single string, which I need for some VCF work. Added some documentation to the join method as well.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2057 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 16:50:01 +00:00
asivache
21729d9311
Do not print debug message when debug mode is not requested!!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2056 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-16 20:28:41 +00:00
rpoplin
967215066d
The old CountCovariates now warns the user if they didn't supply a dbSNP rod file. Thanks Kiran for the use case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2055 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-16 19:16:46 +00:00
rpoplin
eb07c7f7f8
CountCovariates now warns the user if they didn't supply a dbSNP rod file. Thanks Kiran for the use case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2054 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-16 18:44:54 +00:00
ebanks
4558375575
Stage 1 of the VariantFiltration refactoring is now complete. There now exists a parallel tool called VariantAnnotator which simply takes variant calls and annotates them with the same type of data that we used to use for filtering (e.g. DoC, allele balance). The output is a VCF with the INFO field appropriately annotated.
...
VariantAnnotator can be called as a standalone walker or by another walker, as it is by the UnifiedGenotyper. UG now no longer computes any of this meta data - it relegates the task completely to the annotator (assuming the output format accepts it).
This is a fairly all-encompassing check in. It involves changes to all of the UG code, bug fixes to much of the VCF code as things popped up, and other changes throughout. All integration tests pass and I've tediously confirmed that the annotation values are correct, but this framework could use some more rigorous testing.
Stage 2 of the process will happen later this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2053 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-16 02:41:20 +00:00
hanna
ce5034dc5d
Finally reinstate the iterator-style interface. Get rid of some scaffolding code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2052 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-16 02:34:19 +00:00
kiran
103763fc84
An accessor for the VCF header
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2051 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-15 09:28:25 +00:00
kiran
97ed945797
Example code for a bug in the VCF implementation. See JIRA entry at http://jira.broadinstitute.org:8008/browse/GSA-225
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2050 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-15 09:27:12 +00:00
rpoplin
88fd762436
The -rf argument is now being used for read filter and is colliding with my walkers. Changed mine to -recalFile
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2048 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-14 19:37:46 +00:00
rpoplin
b05119987c
Clarified some of the comments in the individual covariates now that things have been moved around to speed up the code. In general most error checking and adjustments to the data are done per read instead of per base. This means that functionality was moved out of the covariate modules and into CovariateCounterWalker and TableRecalibrationWalker.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2047 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-14 18:44:05 +00:00
rpoplin
672472789e
Added some documentation to the helper classes. Fixed an error case in TableRecalibrationWalker.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2046 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-14 18:13:43 +00:00
hanna
15c14add4d
Repackage the aligner for better partitioning. The C aligner, for example, is now
...
partitioned from the Java aligner, and both are partitioned from the more general-
purpose BWT reader.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2045 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 22:55:27 +00:00
rpoplin
d1b525b428
Default window size for NQS covariate is 3
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2040 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 19:24:27 +00:00
rpoplin
394c839974
Implemented NQS covariate. Extended Cycle covariate to handle 454 and SOLID reads. Added a Primer Round covariate for SOLID reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2039 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 19:22:21 +00:00
ebanks
bf451873ff
1. Bug fix: check that AF=0 doesn't contain more probability than 1-fraction
...
2. Fix for Kiran: allow UG to call SNPs at deletion sites; we'll add an annotation to the VariantAnotator for deletions at the locus (next week).
3. Added integration tests for joint estimation model
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2038 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 18:02:18 +00:00
asivache
1be36ca959
Bug fix: when cleanedReadIterator is initialized, it gets immediately set to the contig of the first cleaned read; when the first uncleaned read coming in is on the lower contig, this would trigger 'readNextContig' with that lower contig as an argument. As the result, the whole cleaned reads file would be read through the end and no cleaned reads would be ever seen by the code afterwards. Now we do not call readNextContig if the (uncleaned) read's contig is lower than the current contig already loaded into cleanedReadIterator. the 'readNextContig' method now also throws an exception if requested contig is less than the currently loaded one
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2037 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 15:41:26 +00:00
rpoplin
b1376e4216
structure refactored throughout for performance improvements
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2036 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 15:41:09 +00:00
depristo
cff31f2d06
comments for eric
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2035 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 14:19:31 +00:00
aaron
234bb71747
changed the toVariation() method to take a reference base, instead of using the reference base loaded from the underlying data source (if it was reference aware). Also changed some isVariant() methods which weren't using the passed in ref base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2034 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 06:54:38 +00:00
ebanks
902cf84448
Bug fix: if the most likely allele frequency is 0, don't make a variant call (even if the Qscore for AF=1/n > threshold)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2033 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 04:10:32 +00:00
ebanks
555fb975de
1. Print out allele frequency range (from joint estimation model only).
...
2. Don't print verbose output from SLOD calculation (it's just a repeat of previous output).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2032 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 03:59:13 +00:00
mmelgar
72825c4848
A walker that generates a table of secondary base counts in a bam file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2031 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 02:11:23 +00:00
hanna
8145ed4672
Take 2, updating picard with bug fix for bam files containing no reads.
...
Just stomped on the existing md5s because that's what Eric told me to do.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2029 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 22:52:08 +00:00
ebanks
61b5fb82ce
2 major changes:
...
1. Add dbsnp RS ID to VCF output from genotyper; to do this I needed to fix the dbsnp rod which did not correctly return this value.
2. Remove AlleleBalanceBacked and instead generalize the arbitrary info fields backing VCFs (and potentially others) in preparation for refactoring VariantFiltration next week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2028 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 22:51:49 +00:00
mmelgar
3742a05760
Now can read E2 or SQ tag.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2027 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 15:18:21 +00:00
aaron
c3c001e02e
cleanup of the traversal output code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2026 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 06:18:10 +00:00
ebanks
0922400ca9
Don't try to calculate ratios when DoC is zero (which happens when calls are made by an LD-aware genotyper)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2025 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 02:51:44 +00:00
ebanks
697d7e02c8
Remove the lazy initialize functionality. When no calls are made by the genotyper, we still want a vcf file to be output with valid header.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2024 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 02:14:50 +00:00
hanna
2ea85fb62b
Fix some problematic command-line argument naming and descriptions.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2023 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 02:12:26 +00:00
hanna
0c2a957ae0
Better configuration support. Now supports everything that people have expressed interest in except edit distance.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2021 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 20:54:49 +00:00
depristo
6c9f86bb4d
Removed unnecessary output and added debugging print() routine
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2020 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 18:37:36 +00:00
ebanks
578dcc54a4
Don't create a record if ref=N
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2018 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 04:32:17 +00:00
hanna
8406325247
New Picard is breaking one of the integration tests.
...
Revert until we find out whether the cause is legit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2017 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 03:59:32 +00:00
hanna
499e7d1d75
Push forward some more delicate merging routines.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2016 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 03:07:34 +00:00
hanna
bae4d3f7ea
Updated Picard with fix for Doug Voet. Thanks Alec.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2015 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 02:01:08 +00:00
hanna
2e4782f202
Command-line arguments for SamReadFilters.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2014 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 23:36:17 +00:00
rpoplin
a13cbe1df0
The refactored recalibrator now passes the integration tests as well as my own validation tests. I'm ready to have other people start jamming on the files. I'll make an updated wiki page soon. The refactored recalibrator is currently a bit slower than the old one but there were a lot of great, easy ideas today for how to improve it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2013 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 22:20:06 +00:00
hanna
2cf9670d1e
Allow users to directly specify filters from the command-line, applicable to
...
any walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2012 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 18:40:16 +00:00
ebanks
6a37090529
Output changes for VCF and UG:
...
1. Don't cap q-scores at 99
2. Scale SLOD to allow more resolution in the output
3. UG outputs weighted allele balance (AB) and on-off genotype (OO) info fields for het genotype calls (works for joint estimation model and SSG)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2011 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 16:31:31 +00:00
rpoplin
1e7ddd2d9f
Added a validateOldRecalibrator option to CovariateCounterWalker which reorders the output to match the old recalibrator exactly. This facilitates direct comparison of output. Changed the -cov argument slightly to require the user to specify both ReadGroupCovariate and QualityScoreCovariate to make it more clear to the user which covariates are being used. Some speed up improvements throughout.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2010 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 15:55:56 +00:00
depristo
7e30fe230a
oops, missing file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2009 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 13:25:18 +00:00
depristo
d316cbad4c
VariantFilteration now accepts a VCF rod in addition to an input geli. It will then annotate this VCF file with filtering information in the INFO field too. --OnlyAnnotate will not write in filtering output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2008 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 13:24:58 +00:00
aaron
f9819d5f13
a little clean-up
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2007 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 06:18:34 +00:00
aaron
2ed423ed56
print the current location in read walkers (in addition to the number of reads processed), along with some refactoring to support the change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2006 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 05:57:01 +00:00
ebanks
c9c3cf477a
Based on feedback from Kiran, we know uniquify sample names as sample.rodName (instead of sample.1, sample.2, ...)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2005 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 02:41:37 +00:00
ebanks
2fa2ae43ec
Enough people have found this useful, so...
...
Moving Callset Concordance tool to core and adding integration test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2003 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 20:59:18 +00:00
ebanks
3793519bd4
-Added convenience method to VCF record to tell if it's a no call and have rodVCF use it before querying for info fields
...
-Don't restrict info fields to 2-letter keys
[about to move these to core]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2002 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 20:52:51 +00:00
rpoplin
740a5484c4
Added some documentation to the code, mostly especially to CovariateCounterWalker but various comments added throughout. Also changed the HashMap data structure to accept an estimated initial capacity. This had a very modest improvement to the speed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2001 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 20:13:56 +00:00
ebanks
74751a8ed3
-Some minor fixes to get accurate vcf record merging done
...
-Improvement to snp genotype concordance test
And with that, it looks like I get revision #2000 .
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2000 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 06:40:55 +00:00
ebanks
ab705565cf
Completely refactored the Callset Concordance code. Now, it takes in VCF rods and emits a single VCF file which has merged calls from all inputs and is annotated (in the INFO fields) with the appropriate concordance test(s).
...
Still needs a bit of polish...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1999 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 05:03:13 +00:00
ebanks
bc6f24e88f
Added VCFUtils which contains some useful VCF-related functions (e.g. ability to merge VCF records).
...
Also, various minor improvements.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1998 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:53:32 +00:00
ebanks
cff645e98b
convenience method to deal with genotypes that are unsorted (e.g. CA vs. AC)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1997 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:45:49 +00:00
kiran
7fde6c0bf4
One more output tweak.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1996 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:42:55 +00:00
kiran
00a7113d7a
Tweaks to formatting of output table.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1995 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:33:36 +00:00
ebanks
7ce0df76f8
Added accessors to the rod data sources so that walkers can access the name/file/type triplets for input rods. This is necessary if e.g. you want to create a vcf writer based on all of the samples being input.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1994 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:25:39 +00:00
ebanks
d07f3bb6f6
Added methods to get strand bias and to test if record has allele freq or bias fields set.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1993 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:20:35 +00:00
kiran
3313b0ddb4
Fixed a minor bug where the lodThreshold wasn't being printed in the header.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1992 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:51:36 +00:00
kiran
95d381efe2
Optionally computes the error rate using the best base and a random base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1991 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:47:34 +00:00
kiran
567f5758d2
Optionally lists read depths by read group.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1990 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:39:19 +00:00
kiran
a679bdde18
FindContaminatingReadGroupsWalker lists read groups in a single-sample BAM file that appear to be contaminants by searching for evidence of systematic underperformance at likely homozygous-variant sites.
...
Procedure:
1. Sites that are likely homozygous-variant but are called as heterozygous are identified.
2. For each site and read group, we compute the proportion of bases in the pileup supporting an alternate allele.
3. A one-sample, left-tailed t-test is performed with the null hypothesis being that the alternate allele distribution has a mean of 0.95 and the alternate hypothesis being that the true mean is statistically significantly less than expected (pValue < 1e-9).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1989 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:36:39 +00:00
kiran
2225d8176e
A convenience class for maintaining a dynamically growing table of values with access to the elements by named row and column identifiers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1988 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:34:35 +00:00
hanna
21c5f543fa
Fix sharding bug -- loci to which >100,000 (= 1 shard) reads are assigned an
...
alignment start will confuse the sharding system and cause it to return duplicate reads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1987 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 14:27:26 +00:00
rpoplin
84ba604611
Sequential quality score calculation is now in place in the refactored recalibrator and matches the quality scores calculated by the old recalibrator exactly; at least on the small sets of data used so far. Validation, documentation, and optimization work is on going.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1985 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-07 15:55:16 +00:00
depristo
bf1bc94060
Fixes for PooledConcordance bugs and lack of safety checking
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1984 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-07 01:54:10 +00:00
rpoplin
66d4a995e6
Initial check in of refactored Recalibrator. The new walkers are called CountCovariatesRefactored and TableRecalibrationRefactored. More work is needed to finish up the sequential calculation and to document the code sufficiently. These files are not ready to be used by other people quite yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1982 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 22:33:55 +00:00
ebanks
6fdfc97db6
Added optional field DP to VCF output for Mark.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1981 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 20:03:22 +00:00
ebanks
0a55fa5bb1
Completely refactored the Genotype Concordance module(s).
...
Now PooledConcordance and GenotypeConcordance inherit from the same super class (and can therefore share data structures and functionality). Also, they now use ConcordanceTruthTable to keep track of necessary info.
GenotypeConcordance passes integration tests.
PooledConcordance needs to be finished by Chris.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1979 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 16:27:16 +00:00
ebanks
d549347f25
Refactored GenotypeLikelihoods to use an underlying 4-base model.
...
It needs to be modified a bit and then hooked up to a pooled model, but that is now possible.
At this point, there is no difference to the Unified Genotyper.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1978 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-05 21:59:25 +00:00
jmaguire
4d3871c655
don't flush anymore.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1977 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-05 19:11:51 +00:00
aaron
aacd72854f
a fix for a bug Andrey discovered: in read-based interval traversals we're dupplicating reads in rare cases. The problem was that to accomidate a bug in SAM JDK indexing, we were forced to add one to the stop of our QueryOverlapping() calls to ensure we always got all of the overlapping reads.
...
Added a PlusOneFixIterator that wraps other iterators, and eliminates reads that start outside of our intended interval (interval stop - 1). Updated and checked BamToFastqIntegrationTest MD5 sums.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1976 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-05 05:26:33 +00:00
hanna
43c3ee61d5
Fix minor mapping quality bug.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1973 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-04 14:33:23 +00:00
ebanks
a545859c62
Joint Estimation model now emits a reasonable slod
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1969 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 21:12:42 +00:00
ebanks
11d950abe0
No longer allow the lod_threshold argument - use confidence instead.
...
Have UG output qscores in all cases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1968 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 16:18:51 +00:00
asivache
2fb45dbd73
Make window size a command line argument
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1967 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 16:13:35 +00:00
asivache
55f61b1f88
Bug fix in adjustment of the shift position.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1966 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 16:08:11 +00:00
depristo
5d5dc989e7
improvements to VCF and variant eval support of VCF -- now listens to the filter field
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1963 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 12:09:30 +00:00
hanna
c63af32fc7
The BWA/C bindings were triggering the local aligner to repeatedly reload the
...
ref genome. Make sure the reference genome is cached.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1961 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 00:01:55 +00:00
ebanks
3a33401822
2nd stage of the genotyper output refactoring is complete.
...
Now, all output is generalized and all of the intelligence lies where it is supposed to.
Next stage is syncing up old and new models and making sure we're outputting exactly what we should.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1960 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 22:43:08 +00:00
aaron
ba67c7f02b
added a warning for those using bed files; we properly convert bed to the internal representation but the user needs to be aware that any output will be one-based closed intervals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1959 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 21:09:18 +00:00
aaron
b71b66bd88
the underlying parameter is a float so we need to use Float.valueOf() instead; Noticed by external user Hou Huabin
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1958 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 20:22:25 +00:00
hanna
5a510e6d98
New PackageUtils interferes with the packaging utility. Revert until Aaron and
...
I can get together to make this work.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1957 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 19:14:14 +00:00
aaron
de6ae51f7e
Scala walkers can now be build and run like any other walker in the GATK. Added the getUrlsForClasspath to PackageUtils, the Reflections package isn't getting the manifest files from jars in the classpath, and so we weren't seeing any walkers outside of the GenomeAnalysisTK.jar.
...
A couple of notes:
-Commented out BaseTransitionTableCalculator.scala because it's won't build; Chris could you fix this one (or kill it if it's not needed).
-Removed the PrintReadsScala walker; moved the code over to a ScalaCountLoci walker (which is what the code was really doing).
-Added configurations items to the ivy xml file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1956 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 06:02:41 +00:00
hanna
1896f334d9
Fixed collection of bugs in reads aligning to multiple locations.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1955 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 04:02:09 +00:00
ebanks
af6d0003f8
-Generalized the GenotypeConcordance module to deal with any number of individuals (although it will default to its old behavior if the -samples argument is left out).
...
-Make rods return the appropriate type of Genotype calls from getGenotype().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1954 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-01 05:35:47 +00:00
hanna
b95165e39c
Make alignment (temporarily) part of main GenomeAnalysisTK.jar. Add some extra logging errors on failure.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1953 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-01 00:33:18 +00:00
asivache
4b0796ba58
After fixing a few glitches and bugs, this version finally works as intended
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1952 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-31 04:59:58 +00:00
depristo
7d0ac7c6f2
Fix for long-term VariantEval bug plus new intergration test to catch it
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1951 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-31 00:00:33 +00:00
asivache
ea8d5c7077
Some internal refactoring. Now "safely" ignores duplicate records (NOT duplicate reads but rather malformed bam files!) resulting from the bug/feature in CleanedReadInjector.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1949 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 17:50:51 +00:00
hanna
a3da475c88
Documentation and cleanup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1946 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 15:40:28 +00:00
hanna
2d15891719
Created walkers for alignment, validation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1945 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 15:04:07 +00:00
ebanks
51fffc7f69
Comments for Ryan (which also apply to ReadQualityScoreWalker).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1944 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 14:44:04 +00:00
ebanks
ccd7440730
We can actually make this a bit simpler (and faster)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1943 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 04:21:03 +00:00
ebanks
1b6333e4ab
Enough people have asked for this that it just needed to get written.
...
One can now split up any number of sets into an N-way Venn (although it doesn't check for discordance in the calls, so you'll still want to use SimpleVenn for 2-way comparisons).
Wiki docs are updated.
To do: update to use Ryan's generic hash map when it's ready for public use.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1942 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 04:08:45 +00:00
ebanks
4bdb5b03bd
tell UnifiedGenotyper to return calls at all bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1941 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 03:10:44 +00:00
ebanks
4ee1d6f733
-Have the calculation models determine whether a call passes the lod/confidence thresholds (as opposed to returning everything and letting the UG decide); this way, walkers which call map() will get only the good calls.
...
-Do the right thing in all models for all-base-mode (for Kiran).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1940 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 02:35:51 +00:00
ebanks
64ac956885
Okay, I caved in:
...
CallsetConcordance now gets possible concordance types by looking at classes that implement ConcordanceType instead of having them hard-coded in.
Thanks to Kiran this was pretty easy...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1939 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 00:32:26 +00:00
hanna
1f0d852a48
Fix bug where alignments with indels would be busted because bwa reverses
...
the read bases to undo a previous read base reverse that doesn't occur in the
libbwa codepath.
Also fixed some memory management issues.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1938 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 21:33:13 +00:00
asivache
e3b4d4cbed
Genotyper reimplemented. Does the same thing, at least for now, but internal data structures redesign enables collecting various statistics for indel-containing/reference-matching reads. The statistics are not yet used by the caller itself to make a better judgement w.r.t. the validity of the calls it makes, but they are now printed into the output stream (--verbose). The statistics (for both normal and tumor) include: indel observation count/total coverage, av. number of mismatches per indel-containing and per ref-matching read, av. mapping quality, av. mismatch rate and av. base quality within an NQS windoew around the indel, numbers of indel and ref observations per strand.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1936 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 19:09:16 +00:00
hanna
f04b80d7db
Fixed epic memory leak.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1934 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 16:32:43 +00:00
ebanks
1c4ca9d383
-Mark just reminded me: actually force the ref/loc to be immutable
...
-VCF writer should be blind to the score/confidence/lod value - just print the thing out as is
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1932 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 13:41:53 +00:00
ebanks
5cdbdd9e5b
now that the design is stable, pull the setReference and setLocation methods back out of Genotype and stick them into constructors of implementing classes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1931 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 13:27:37 +00:00
ebanks
3091443dc7
Sweeping changes to the genotype output system, as per several discussions with Matt & Aaron.
...
Some things still need to be changed, but it will entail some more design decisions first (which means I get to bug M&A again tomorrow!).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1930 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 03:46:41 +00:00
depristo
86573177d1
Reverting rod walkers to use underlying refwalker implementation while we work on ROD2 and reenable the system. Added some serious sparse file parsing to variant eval tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1929 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 01:04:37 +00:00
hanna
c9a3707cfd
Initial version of BWA/C bindings. Still lots of squirrels roaming the code.
...
- Some cigar strings aren't right.
- Memory leaks.
- BWA codebase changes aren't committed to BWA tree.
- Aligner interface butchered to support BWA/C-style alignments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1928 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-28 21:37:49 +00:00
chartl
c4359bc340
Whoops. Forgot the implements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1927 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-28 19:59:57 +00:00
aaron
5a3bd50537
adding error log reporting to the GATK, and a stream based output method for the argument collection
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1926 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-28 19:56:05 +00:00
chartl
863d3023d5
IndelCounterWalker -- a new little walker that counts indels over a region (want to see what kind of havoc BWA may be resulting in). Don't know when BasicPileup.indelPileup() was written, but kudos to whoever wrote it.
...
BTTJ - remove 'N's from previous base analysis -- even if both read and ref are 'N' (which does happen, occasionally)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1925 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-28 19:50:50 +00:00
aaron
04e9a494e9
removed the GenotypesBacked interface, which is currently unused. Also cleaned up some documentation lines
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1924 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-28 18:08:14 +00:00
rpoplin
06ff81efe5
Added NeighborhoodQualityWalker.java and ReadQualityScoreWalker.java which are used to calculate a read quality score based on attributes of the read and the reads in the neighborhood.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1922 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-28 13:24:11 +00:00
depristo
68fa6da788
Initial graph-based reference implementation and alignment assessor. Not suitable for public use
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1921 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 21:54:47 +00:00
depristo
31d143a841
now only needs READS
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1920 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 21:54:14 +00:00
depristo
ef2ea79994
code cleanup and containsStartPosition function
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1919 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 21:53:40 +00:00
depristo
186a8dd698
Trivial protection for null value
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1918 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 21:52:52 +00:00
depristo
be333da9c0
charSeq2byteSeq -- convert a char[] to a byte[] for convenience
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1917 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 21:52:23 +00:00
chartl
4192b093b8
More robust error handling with parallelization + usePreviousBase. Added forceReadBasesToMatchRef to use in conjunction with nPreviousReadBases as a less stringent approximation of usePreviousBases (requiring previous pileups only had mismatches, and that read mapping quality be high was throwing everything away)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1916 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 17:20:44 +00:00
chartl
31d5df2859
Previous base now checks that the read matches the reference in the previous base window.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1915 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 15:58:20 +00:00
depristo
726378be8b
Almost ready to stop doing eagar decoding; waiting on Eric
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1914 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 15:28:05 +00:00
ebanks
e96b1791ab
Need to check for biallelic snp or exception gets thrown.
...
Also, update to new tracker calls.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1913 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-27 02:43:43 +00:00
aaron
3fb3773098
a fix for traverse dupplicates bug: GSA-202. Also removed some debugging output from FastaAltRef walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1912 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-26 20:18:55 +00:00
hanna
a1e8a532ad
Support for initialize() and onTraversalDone() output from parallelized walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1911 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-26 20:18:31 +00:00
chartl
62c1001790
BTTJ is now correct. What a terrible waste of time, turns out I'd just reversed the header. Because of this the MD5 had to be updated in the tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1910 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-26 19:24:18 +00:00
sjia
24c7f694e6
Handles allele frequencies for any specified population, changed user input for mismatch filter options
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1909 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-25 22:51:56 +00:00
chartl
db9419df49
@ Hack to allow output from onTraversalDone()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1908 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-25 15:19:04 +00:00
ebanks
75ad6bbef7
Check that map isn't being called passing in null arguments.
...
(This seems wrong; see JIRA entry GSA-211)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1907 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-25 02:30:36 +00:00
depristo
b4f55df600
Bugfix for Jason F
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1906 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-24 22:09:27 +00:00
hanna
65b98470f3
Temporary fix: have RodLocusView manage and close its RODs. Really the
...
relationship between these two classes needs to be rethought; see JIRA
GSA-207.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1904 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-23 16:00:12 +00:00
aaron
ad1fc511b1
intermediate commit for some changes in the Variation system, so Eric can go ahead with his changes. Everything is pretty set, but the Variation interface could use a convenience method that joins all the alternate alleles.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1903 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-23 06:31:15 +00:00
ebanks
6c338eccb8
Joint Estimation model now emits calls in all formats.
...
The whole GenotypeCall framework needs to be changed, but this will work for the time being.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1902 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-23 03:07:28 +00:00
chartl
a6dc8cd44e
BTTC is now Tree Reducible allowing for parallelization.
...
Integration test comment changed to reflect actual date of last md5 update.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1901 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-22 23:19:29 +00:00
hanna
2e552eb5a1
Validates intervals against sequence dictionary header bounds.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1900 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-22 19:31:15 +00:00
ebanks
54c61c663c
-Cleanup of the Joint Estimation code
...
-Don't print verbose/debugging output to logger, but instead specify a file in the argument collection (and then we only need to print conditionally)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1899 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-22 15:25:29 +00:00
asivache
2cab4c68d4
Added method: isCodingExon(). Returns true if position is simultaneously within an exon AND within coding interval of any single transcript from the list. The old method of detecting coding positions as isExon() && isCoding() is buggy, as the position could be in the UTR part of one transcript (isExon() is true), and within coding region bounds (but not in the exon) of another transcript (isCoding() is true). As a result UTR positions would be erroneously annotated as coding.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1898 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-22 14:55:07 +00:00
chartl
af761fb9bd
Base transition table now forces epsilon/3 (three-state) model for the unified genotyper. Verified to be identical with changing the default model to being epsilon/3. This of course changes the observed counts, so the integration test has been updated.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1897 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 21:18:26 +00:00
ebanks
55fa1cfa06
-Renamed new calculation model and worked out some significant xhanges with Mark
...
-Allow walkers calling the UG to pass in their own argument collections
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1896 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 20:49:36 +00:00
chartl
8e3f72ced9
BTTJ - Code refactoring (major) - passes integration test
...
VariantEvalWalker - whoops, wrote PooledGenotypeAnalysis rather than PooledAnalysis, now passes tests again
- PooledFrequencyAnalysis - don't bother initializing matrices if this isn't a pool
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1895 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 19:04:51 +00:00
depristo
15a1849758
notes for chartl
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1894 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 18:31:31 +00:00
chartl
77863d4940
@PowerBelowFrequency
...
+ Changes to doc
@ BasicPoolVariantAnalysis
+ use char rather than ReferenceContext
+ calculate # alleles
@ PooledFrequencyAnalysis
+ breakdown of call metrics by estimated number of alleles in pool
@ VariantEvalWalker
+ add PooledFrequencyAnalysis to analysis set
@ PooledGenotypeConcordance
+ correctly calculate maximal allele frequency for output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1893 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 15:17:11 +00:00
chartl
967128035e
Make command like args default to false.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1892 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 13:59:35 +00:00
ebanks
9b9744109c
Mark's new unified calculation model is now officially implemented.
...
Because it doesn't actually use EM, it's no longer a subclass of the EM model.
Note that you can't use it just yet because it doesn't actually emit calls (just prints to logger). I need to deal with general UG output tomorrow. Hold off until then, Mark, and then you can go wild.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1891 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-21 02:39:23 +00:00
depristo
caa3187af8
Enabling correct high-performance ROD walker and moved VariantEval over to it. Performance improvements in variantEval in general. See wiki for full description
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1890 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 23:31:13 +00:00
chartl
4a8a6468be
Use read group as a condition for confusion tables. With an integration test.
...
Changed BaseTransitionTable to comparable objects for consistent ordering of output
( e.g. so the integration test doesn't yell so much )
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1889 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 19:39:32 +00:00
chartl
b83df5616a
Change for lower-case references (always compare upper case bases)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1888 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 17:36:31 +00:00
chartl
3b1fabeff0
Major code refactoring:
...
@ Pooled utils & power
- Removed two of the power walkers leaving only PowerBelowFrequency, added some additional
flags on PowerBelowFrequency to give it some of the behavior that PowerAndCoverage had
- Removed a number of PoolUtils variables and methods that were used in those walkers or simply
not used
- Removed AnalyzePowerWalker (un-necessary)
- Changed the location of Quad/Squad/ReadOffsetQuad into poolseq
@NQS
- Deleted all walkers but the minimum NQS walker, refactored not to use LocalMapType
@ BaseTransitionTable
- Added a slew of new integration tests for different flaggable and integral parameters
- (Scala) just a System.out that was added and commented out (no actual code change)
- (Java) changed a < to <= and a boolean formula
Chris
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1887 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 14:58:04 +00:00
aaron
4be6bb8e92
added a check to ensure the eval track variation is bi-allelic. Also changed some string constants over to enums. For some reason my check-ins from home wouldn't work last night, so this is the actual changes for 1884.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1886 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 14:15:33 +00:00
depristo
449a6ba75a
Deleting lots of code as part of my cleanup. More classes tagged for removal. Many more walkers have their days numbered.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1885 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 12:23:36 +00:00
aaron
d749a5eb5f
added a check to ensure the eval track variation is bi-allelic. Also changed some string constants over to enums
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1884 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 04:56:51 +00:00
ebanks
b8ab77c91c
Don't filter out reads without proper read groups. Instead, allow the user (or another walker calling UG) to specify an assumed sample to use (but then we assume single-sample mode).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1883 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 01:30:53 +00:00
depristo
a8a2c1a2a1
Replaced SSG with UG in packaging utils. Minor performance and formatting improvements for ClipReads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1882 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 01:19:58 +00:00
ebanks
c29924e7cf
Reverting previous change.
...
Aaron, it's all yours...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1881 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 00:55:24 +00:00
aaron
d21b582b18
memory leak, where the Resource Pool was releasing based on the value and not the key, resulting in the resourceAssignments map growing with each additional shard
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1880 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 00:39:42 +00:00
ebanks
761a730758
assertBiAllelic -> assertMultiAllelic.
...
Chris, if this breaks an integration test, you get it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1879 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 00:09:46 +00:00
depristo
2a26bb42dd
Softclipping support in clip reads walker. Minor improvement to WalkerTest -- now can specify file extensions for tmp files. Matt -- I couldn't easily create non-presorted SAM file. The softclipper has an impact on this.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1878 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 21:54:53 +00:00
chartl
055a99fb05
Change in ordering for a disjunctions. Walker will no longer try to calculate number of simple mismatches in the pileup if the pileup includes 'N's.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1877 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 18:24:14 +00:00
aaron
cfa86d52c2
ensure that in the indel case we don't allow identification as both an insertion and deletion at the same location in the VCF ROD
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1875 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 18:21:00 +00:00
chartl
3d50c72d74
Forgot a dumb little System.out.println. You will be flooded with "This read will not be used." statements until, overwhelmed, you give in to my demands.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1874 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 16:13:48 +00:00
chartl
225ef52973
Now produces same output as the Scala walker for unconditioned tables (no 2bb, no previous base, etc.)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1873 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-19 16:10:44 +00:00
ebanks
51f9ec0a5c
subtract largest posterior value from all values; this hopefully solves any precision issues
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1870 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-18 05:20:15 +00:00
ebanks
b9e8867287
-push allele frequency and genotype likelihood variable definitions down into the subclasses so that they can use different data structures
...
-use slightly more stringent stability metric
-better integration test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1869 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-18 04:22:17 +00:00
depristo
d6385e0d88
simpleComplement function() in BaseUtils. Generic framework for clipping reads along with tests. Support for Q score based clipping, sequence-specific clipping (not1), and clipping of ranges of bases (cycles 1-5, 10-15 for example). Can write out clipped bases as Ns, quality scores as 0s, or in the future will support softclipping the bases themselves.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1868 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 22:29:35 +00:00
chartl
ad777a9c14
@BasicPileup - made the counts public so they can be used
...
@PoolUtils - split reads by indel/simple base
@BaseTransitionTable - complete refactoring, nicer now
@UnifiedArgumentCollection - added PoolSize as an argument
@UnifiedGenotyper - checks to ensure pooled sequencing uses the appropriate model
@GenotypeCalculationModel - instantiates with the new PoolSize argument
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1867 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 21:56:56 +00:00
andrewk
bdb34fcf38
Updated integration tests for VariantEval. Hooray for IT!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1866 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 20:00:29 +00:00
hanna
85a4fbc256
Bumping version of Picard for firehose compatibility.
...
Integration tests were validated against svn rev 1861, before the wonder
twins committed their changes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1864 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 19:38:56 +00:00
aaron
8aacc43203
VCF output now emits no calls as ./.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1863 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 18:51:31 +00:00
andrewk
d1a4cd2f73
Added ValidationData analysis type to VariantEvalWalker; this eval takes a GFF file with validated truth data positions (bound to "validation")and calculates the accuracy of the genotype calls bound to "eval".
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1862 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 15:39:08 +00:00
ebanks
418e007ca6
A cleaner interface: now everyone can use UG's initialize method
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1860 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 14:09:16 +00:00
aaron
96972c3a5c
a fix for a bug Eric found: if your first call contains fewer samples than calls at other loci, your VCFHeader got setup incorrectly.
...
Also moved a buch of Lists over to Sets for consistancy.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1859 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 04:57:50 +00:00
aaron
a69ea9b57c
Cleaning up the VCF code, adding lots of tests for a variety of edge cases. Two issues are still outstanding: updating the no call string with the standard 1000g decided on today, and fixing Eric's issue where not all the VCF sample names are present initially.
...
also: their, I hope your happy Eric, from now on I'll try not to flout my awesomest grammer in the future accept when I need to illicit a strong response :-)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1858 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 04:11:34 +00:00
ebanks
b82c3b6040
Better error output (and fixed spelling mistakes)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1857 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 01:01:45 +00:00
ebanks
993c567bd8
I had to remove some of my more agressive optimizations, as they were causing us to get slightly different results as MSG. Results in only small cost to running time.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1856 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-16 00:59:32 +00:00
asivache
7d7ff09f54
throw an exception if read has no associated read group
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1855 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 18:11:32 +00:00
chartl
b9544d3f89
Output formatting change (very slight)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1854 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 16:47:29 +00:00
hanna
839c5d66bc
Read uints directly into longs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1853 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 16:15:11 +00:00
hanna
ce38fa7c81
Breaking the signed int glass ceiling; stage 1: convert critical ints to longs. Code cleanup and documentation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1852 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 15:28:56 +00:00
kcibul
79993be46c
changed blank gene name to UNKNOWN
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1851 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 13:47:00 +00:00
depristo
0c2016c19a
Improved error messages -- now easier to read, points to the GATK Error Messages wiki, and avoids double printing of stack traces
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1850 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 12:07:44 +00:00
aaron
a9094c835c
clean-up and fixes to the VCF input
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1849 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 04:53:59 +00:00
ebanks
a32470cea1
Deal with the fact that walkers can call UG's init/map functions directly.
...
We need to filter contexts in that case since the calling walkers don't get UG's traversal-level filters.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1848 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 02:31:45 +00:00
hanna
8dca236958
Base-packed reader cleanup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1847 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-15 01:26:23 +00:00
hanna
316b30ee56
On the road to human: make sure the suffix array will fit in a Java array.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1846 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 21:45:35 +00:00
ebanks
e740e7a7ce
Because walkers call UG's map function, we need to move the actual writing out
...
to UG's reduce function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1845 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 20:49:26 +00:00
kcibul
825e6c7a4d
added calculation for bases over 2x,10x,20x,30x plus gene name
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1844 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 20:32:26 +00:00
aaron
727b69fce0
catch null output destinations earlier
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1843 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 20:07:15 +00:00
chartl
1f66738c8e
Fix a hashing function bug. Ignore reads with non-reference bases in the pileup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1842 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 19:41:26 +00:00
hanna
72c34f11dd
Bug fixing for BWA output formats.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1841 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 19:32:22 +00:00
aaron
60183229ab
the oldest java mistake in the book...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1840 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 19:32:13 +00:00
ebanks
52d2e0ca07
All walkers now use read.getReadGroup()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1839 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 19:27:40 +00:00
chartl
0a09fa4d5c
Rename to distinguish this transition table calculator from the scala version.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1838 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 18:52:21 +00:00
chartl
1d055011bd
Getting rid of this so I can rename it without the world blowing up.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1837 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 18:45:11 +00:00
aaron
eb90e5c4d7
changes to VCF output, and updated MD5's in the integration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1836 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 18:42:48 +00:00
ebanks
89771fef05
-Use read.getReadGroup()
...
-Add another filter for read groups for Chris
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1835 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 18:08:32 +00:00
ebanks
311ab8da5a
A helper class to create the masks for the sequenom design maker.
...
This project is now officially done.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1834 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 17:28:51 +00:00
hanna
3553fc9ec0
Preparing for human -- support bwa output files directly rather than relying on a custom fixed sa interval.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1833 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 17:17:46 +00:00
ebanks
0c95d6906f
Merge both versions of the Sequenom assay design maker: use Jared's base code and add in indels. [Jared, this still emits the same output for SNPs as your original version)
...
Remove all sequenom stuff from the FastaAlternateReferenceMaker so it can just concentrate on making alternate references...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1831 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 17:11:45 +00:00
ebanks
49af5269e5
Jared: feel free to change or revert, but until we move over to UG version...
...
Only print out positions with at least one non-ref call
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1830 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 17:08:57 +00:00
chartl
f5a2e6dd50
Fix!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1829 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 16:15:20 +00:00
ebanks
f2886d88e0
We now emit genotype calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1828 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 02:49:56 +00:00
ebanks
1b214c0de5
Fixed logic: throw exception if contigs are NOT equal
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1827 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 02:48:44 +00:00
ebanks
aeca14d052
On our side of 5CC, we spell multi M-U-L-T-I.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1826 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 01:41:25 +00:00
ebanks
c9c8fd1fef
Added the discovery LOD score to the meta data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1825 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 01:24:06 +00:00
hanna
a76fac4687
Cleanup existing speedups. Minor performance improvements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1823 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 21:51:18 +00:00
hanna
837ae1d33a
Optimization: from 22k reads/min - 30k reads/min.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1822 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 20:59:29 +00:00
ebanks
96b8499a31
Remodeled version of the UnifiedGenotyper.
...
We currently get identical lods and slods as MultiSampleCaller (except slods for ref calls, as I discussed with Jared) and are a bit faster in my few test cases. Single-sample mode still emulates SSG.
The remaining to do items:
1. more testing still needed
2. we currently only output lods/slods, but I need to emit actual calls
3. stubs are in place for Mark's proposed version of the EM calculation and now I need to add the actual code.
More check-ins coming soon...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1821 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 20:27:01 +00:00
ebanks
b28446acac
Multi-sample calls now have associated meta-data (SLOD, allele freq), which wil
...
l soon actually be used...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1820 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 20:08:43 +00:00
hanna
db642fd08b
Optimization: from 10k reads/sec - 22k reads/sec..
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1819 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 18:07:15 +00:00
aaron
77499e35ac
fixes for GSA-199: Need easier way to write binary outputs to standard output. GLF and VCF now have stream constructors, and can get dumped to standard out.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1818 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 15:50:20 +00:00
hanna
f37564e63a
Our BWA is now looking at roughly the same number of candidate alignments as BWA/C. Performance is now at 11k reads / min, still a long way from BWA/C.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1817 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 15:50:04 +00:00
chartl
8d0e057d83
I got bored today and decided to write the confusion matrix calculator. At present it is untested. I'm submitting it to subversion to make sure
...
I have previous revision to revert back to.
This is a calculator that will calculate:
P[ True base is X | read base mismatches, secondary base is Y, previous K bases are Z1,Z2,...ZK ]
where the number of pervious reference bases to take into account is user-defined. The secondary base is optional as well.
--usePreviousBases k
tells the walker to use the k previous reference bases in the transition table
--useSecondaryBase
tells the walker to use the secondary base at a locus in the transition table
these can be used together.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1816 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 02:55:29 +00:00
ebanks
be92a1e603
Don't try to close if the lazy initialize hasn't triggered
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1815 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 01:20:25 +00:00
chartl
ec83bc6ec5
This somehow didn't make it into subversion the last time.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1814 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 21:11:13 +00:00
chartl
ecbb11e017
Modified PowerBelowFrequency to ignore reads below a user-defined mapping quality. Request from Jason Flannick.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1813 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 20:59:24 +00:00
chartl
ec68ae3bc5
Added a filter that will split the read set by a threshold of mapping quality (Request from Jason Flannick)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1812 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 20:58:37 +00:00
chartl
0d73fe69e7
Recalibrator by NQS. Had this puppy running all afternoon. Thing had got through 100,000,000 reads before I decided to delete my sting tree. *sigh*, a little more delay.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1811 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 20:55:02 +00:00
chartl
ee0afba0af
Recalibration stuff...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1810 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 20:51:39 +00:00
ebanks
caf689821f
added method to get normalized posteriors
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1809 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 02:33:22 +00:00
ebanks
cf7a26759d
-use the getReadGroup() function that was added to picard for us
...
-clean up some include lines
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1808 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 01:39:32 +00:00
hanna
d844d1c496
SAMFileWriters specified as command-line arguments were sometimes incorrectly altering the default short name. Make sure short name is not specified if shortName is not specified but fullName is.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1807 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 19:16:46 +00:00
hanna
da084357db
Fixed minor typo in output message.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1806 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 18:56:54 +00:00
aaron
62c484b57a
Fixes for GSA-201, where enumerated types in command line arguments had to be defined as all uppercase for the system to work.
...
Also a little playground walker that changes the sort order flag of a BAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1805 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 18:11:32 +00:00
hanna
32d55eb2ff
Fix issue Eric was seeing with java.lang.Error in unmap0.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1804 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 17:46:56 +00:00
ebanks
9f3482ef11
VCF is both a multi- and single- sample format, so we shouldn't be throwing an exception when used for SS
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1803 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 17:43:26 +00:00
jmaguire
d9f5a314ac
avoid an out of memory error by no putting more than 5000 reads in the cache. on pilot1 at least those are crazy loci anyway.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1802 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 14:56:55 +00:00
hanna
f4b6afb42c
JVM issue id 5092131 ( http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5092131 )
...
was causing OOM issues with the new mmapping fasta file reader during large jobs.
Temporarily reverting the reader until a workaround can be found.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1801 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 04:45:46 +00:00
chartl
6d7f4481e4
Changed traversal type slightly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1800 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 04:11:48 +00:00
ebanks
a9f3d46fa8
Your time has come, SSG.
...
Fare thee well.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1799 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 20:27:56 +00:00
jmaguire
8fdb8922b8
now output in the exact format that works with sequenom software.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1798 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 20:06:27 +00:00
aaron
98e3a0bf1a
VCF can now be emitted from SSG. The basic's are there (the genotype, read depth, our error estimate), but more fields need to be added for each record as nessasary.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1797 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 19:50:04 +00:00
hanna
95f24d671d
Fixed 'visualization' of reads that didn't match bwa's alignments exactly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1796 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 19:45:30 +00:00
kiran
29ad6cd876
Made redundant by BCMMarkDupes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1795 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 18:47:20 +00:00
kiran
94d82d1915
Matthew Bainbridge's duplicate removal utility for 454 data. This code should eventually be moved into a read walker. For now, it's being introduced into the repository as-is (well, with one minor change to make the handling of command-line arguments a little more straightforward).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1794 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 18:32:37 +00:00
ebanks
15bf014e0b
logger.info -> logger.debug (don't want to risk filling up my log on genome-wide calls)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1792 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 17:53:11 +00:00
chartl
f89a89ffe3
Use of AlleleFrequency as an input to PowerAndCoverage is deprecated by the new walker. Reverting to the standard "power at 1 allele" calculation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1788 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 16:07:45 +00:00
chartl
ae05f5c7ad
Fixin the header.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1787 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 15:49:28 +00:00
chartl
11ff1e09b8
A new power walker for the user to feed in a number of alleles. Call that number k. Output is:
...
Locus Power_for_k_alleles Power_for_k-2_alleles Power_for_k-2_alleles ... Power_for_1_allele
This was a request from Jason Flannick & the T2DB group.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1786 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 15:35:35 +00:00
ebanks
04fe50cadd
*** We no longer have a separate model for the single-sample case. ***
...
For now, a single sample input will be special-cased in the EM model - but that will change when the EM model degenerates to the single sample output with a single sample as input. For now, the EM code for multi-samples isn't finished; I'm planning on checking that in soon.
The SingleSampleIntegrationTest now uses the UnifiedCaller instead of SSG, and so should all of you. More on that in a separate email.
Other minor cleanups added too.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1785 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 14:08:57 +00:00
jmaguire
32128e093a
misc. changes to get the numbers back to the baseline while keeping the speedup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1784 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 12:27:07 +00:00
jmaguire
d38a0d04b9
fix a snp mask offset error.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1783 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 12:25:40 +00:00
kiran
829e99413b
Rescores a variant after removing duplicates (defined very strictly as reads with the same start points).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1782 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 03:07:36 +00:00
hanna
fcb6a992c8
Switched IndexedFastaSequenceFile over to use memory mapping to load data rather than
...
the loop-with-small block size. Performance improvements in loading refs are extreme;
segments can be loaded in <1ms. chr1 in its entirety can be loaded in 1.5sec (down
from 30sec).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1781 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 00:07:15 +00:00
jmaguire
02d2492d68
Simple tool for picking sequenom probes for SNPs. Can be extended to indels if necessary.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1780 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 23:46:41 +00:00
ebanks
1905b5defa
Hash by chromosome for now to reduce memory. This is a temporary solution until we decide how to reture the Injector for good.
...
Also, with Picard's latest changes, we need to make sure we don't double-close the sam writer.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1779 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 20:06:25 +00:00
ebanks
f9a1598d75
Reformatting
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1778 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 20:03:34 +00:00
ebanks
203c626fc2
A wrapper around the GenotypeLikelihoods class for the UnifiedGenotyper. This wrapper incorporates both strand-based likelihoods and a combined likelihoods over both strands.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1777 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 19:57:37 +00:00
sjia
5bdcc2b4dc
Included HLA class 2 genes in CreatePedFileWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1776 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 18:46:51 +00:00
sjia
8f896b734f
Included HLA class 2 genes in CreatePedFileWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1775 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 18:28:01 +00:00
aaron
f9a0eefe4b
GELI_BINARY is now functional, and can be used as a variant type in SSG (-vf=GELI_BINARY). Also fixed the max mapping quality column in both GELI output formats, we haven't been correctly outputing up until now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1774 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 18:20:34 +00:00
chartl
225b9bccc1
Modifications to NQSClusteredZScoreWalker to output empirical mismatch rates on bins by both Z-score and reported Q-score, rather than averaging over all Q-score bins for each Z-score.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1773 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 13:45:12 +00:00
depristo
8dd0924b37
Minor performance improvements to VariantEval -- now all of the CPU time is spent dealing with the ROD system...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1772 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 23:40:30 +00:00
aaron
4554ca1b28
more cleanup, depecaited the old genotype, corrected SNPCallsFromGenotypes' imports and two other classes that depend on it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1771 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 19:09:27 +00:00
aaron
3aec76136f
Removing the AllelicVariant interface, which is replaced by the Variation interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1770 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 17:44:24 +00:00
depristo
1bd0c3c145
variant eval allows non Variation rod objects
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1768 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 13:04:26 +00:00
aaron
66fc8ea444
GSA-182: Adding support for BED interval files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1767 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 02:45:31 +00:00
hanna
aec83b401d
SSG multithreading doesn't play well with some I/O changes made since I last svn up'd. Reverting until I can find the reason.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1766 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-05 19:48:57 +00:00
hanna
8a503c86b6
Code supporting SSG proof-of-concept shared memory parallelism.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1765 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-05 18:56:16 +00:00
ebanks
fb619bd593
-Refactoring: make GenotypeCalculationModel constructors empty so that they don't have to be updated every time we add a new parameter; instead put that logic in the super class's initialize method (making everything protected so that only the factory can access them)
...
-Adding initial version of Multi-sample calculation model. This still needs much work: it needs to be cleaned up and finished. Right now, it (purposely) throws a RuntimeException after completing the EM loop.
Also:
-Fix logic in GenotypeLikelihoods.setPriors
-Add logger to the models for output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1764 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-05 18:10:36 +00:00
sjia
98076db6b4
Modified CreatePedFileWalker to output PED file given HLA allele names
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1763 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-05 03:06:42 +00:00
hanna
56bc4fa21a
Fixed bug where not all alignments were returned if read aligned to multiple locations. Enhanced test suite to validate all alignments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1762 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-04 18:20:20 +00:00
hanna
05aa928e3e
Fix off-by-number-of-deletions issue with negative strand reads. Improved performance by factor of 2.5x.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1761 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-03 21:55:18 +00:00
chartl
7605ee500c
Idiocy! All tests were being disabled because I forgot the instanceof
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1760 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 20:04:56 +00:00
chartl
88d0890cc3
Made PooledGenotypeConcordance a standard test in VariantEval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1759 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 20:03:31 +00:00
aaron
7fc4472e6d
A big fix for MergingSamRecordIterator, where we weren't correctly handling the comparisons of SAMRecords correctly (we weren't applying the new reference index first, so sometimes the MT contig would be ID 23, sometimes 24 in different records).
...
Also a fix to the GLF tests, and a correction to PrintReadsWalker to remove the close() on the output source, the source handles that itself (and you get a double close).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1758 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 19:35:35 +00:00
chartl
68cb2ee54b
Tweaks to parameters for NQS analysis walkers; change to PowerAndCoverage for Jason Flannick (can input the number of alleles to compute power for - i.e. doubletons, tripletons; rather than statically checking singletons.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1757 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 19:11:27 +00:00
ebanks
53a4bd7f51
A better understanding of what's going on means no need for clearing the cache
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1755 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 18:07:46 +00:00
aaron
e885cc4b21
changes for corrected GLF likelihood output, along with better tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1754 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-01 20:45:05 +00:00
hanna
2309d19f6f
Bug fix from Michael Ross: mark second read in sequence as second of pair.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1753 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-01 14:34:36 +00:00
aaron
2e4949c4d6
Rev'ing Picard, which includes the update to get all the reads in the query region (GSA-173). With it come a bunch of fixes, including retiring the FourBaseRecaller code, and updated md5 for some walker tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1751 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 20:37:59 +00:00
ebanks
303972aa4b
Yup, I broke the build...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1750 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 20:20:43 +00:00
ebanks
841d25cc44
Added ability to set the priors after construction (and requiring a flushing of the likelihoods cache)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1749 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 19:55:49 +00:00
hanna
665951f9f0
Support negative strand alignments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1748 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 18:10:26 +00:00
hanna
d3b1732cca
Start of refactoring effort. Make construction of alignment object simpler.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1747 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 15:19:31 +00:00
hanna
70e1aef550
Better integrate the @ArgumentCollection into the command-line argument parser. Walkers can now specify their own @ArgumentCollections. Also cleaned up a bit of the CommandLineProgram template method pattern to minimize duplicate code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1746 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 22:23:19 +00:00
aaron
b1c321f161
Adjusted Genotype concordance to more accurately use the new Genotyping code, fixed the VCF rod, and temp. fix the build by reintroducing Shermans ReadCigarFormatter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1745 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 21:28:21 +00:00
sjia
9b78a789e2
HLA Caller 2.0 Walkers:
...
CalculateBaseLikelihoodsWalker.java walks through reads calculates likelihoods using SSG at each base position
CalculateAlleleLikelihoodsWalker.java walks through HLA dictionary and calculates likelihoods for allele pairs given output of CalculateBaseLikelihoodsWalker.java
CalculatePhaseLikelihoodsWalker.java walks through reads and calculates likelihoods score for allele pairs given phase information
File Readers:
BaseLikelihoodsFileReader.java reads text file of likelihoods outputted by SSG
FrequencyFileReader.java reads text file of HLA allele frequencies
PolymorphicSitesFileReader.java reads text file of polymorphic sites in the HLA dictionary
SAMFileReader.java reads a sam file (used to read HLA dictionary when in another walker)
SimilarityFileReader.java reads a text file of how similar each read is to the closest HLA allele (used to filter misaligned reads)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1744 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 20:45:55 +00:00
chartl
281a77c981
Bugfix. isMismatch() was actually computing isMatch().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1743 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 20:04:59 +00:00
chartl
e28b45688c
More NQS Related Walkers to play with
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1742 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 20:01:04 +00:00
ebanks
9ef80e3c3c
One minor addition: to incorporate Pooled calling (and to be as general as possible), we allow the genotype calculation model to use rods if it wants.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1741 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 17:05:59 +00:00
ebanks
19bfe43173
First pass at a unified caller, being checked in now so Mark can give feedback if he chooses and so Matt can debug issues with the ArgumentCollection class.
...
Some notes:
1. This design should be flexible enough to include pooled calling (for now) after discussions with Chris.
2. Using the unified caller with the SingleSampleCalculationModel emits the exact same output as SSG over all of chr20 for NA12878. Additionally, when we include the "max deletions allowed at a locus" argument (so we don't try to call SNPs at deletion sites), it removed 233 SNP calls in chr20 that were clearly indel artficts.
3. The MultiSampleEMCalculationModel is still a work in progress and will be checked in later this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1740 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 16:48:15 +00:00
ebanks
8bd345ba00
Generalized deletions in pileup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1739 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 15:58:43 +00:00
andrewk
6134f49e3c
Convert de novo SNP caller to run using parent1 and parent2 BAM files (by splitting contexts by reader using getMergedReadGroupsByReaders) instead of geli files providing a large speed-up and obviating the need for large whole-genome geli files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1738 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 06:42:21 +00:00
andrewk
5dab95aa5a
Fix getMergedReadGroupsByReaders so that it provides read groups in the same way Picard does so that it works correctly when input read files have no clashes in their read groups and retain their original read group names.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1737 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 06:35:50 +00:00
andrewk
5662a88ee1
Cosmetic change to list sampling functions: the typical usage of n and k were reversed. No change in functionality of the classes has been made and unit tests still pass.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1736 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-28 18:12:32 +00:00
aaron
39598f1f0a
switching the concordance walker over to the new Variation system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1735 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-28 15:46:36 +00:00
asivache
bce2f0d7cf
Now instantiates the list of alternative consenses to evaluate as LinkedHashSet to guarantee iterator traversal order. Old implementation used HashSet and exhibited unstable behavior when two alt consenses turned out to be equally good: depending on the run conditions (including size of the interval set being cleaned??), either one could be seen first as selected as the 'best' one
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1734 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-28 06:15:46 +00:00
asivache
663175e868
Bug fix: when jumping onto next contig (chromosome), the walker was erasing last mismatch interval from the previous chr it was still holding without printing it; now it gets printed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1733 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 22:24:34 +00:00
asivache
92c6efabb7
moving IndelGenotyper out of playground
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1732 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 19:44:49 +00:00
asivache
aec61c558b
moving IndelGenotyper out from playground
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1731 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 19:43:53 +00:00
chartl
fe6d810515
Some basic commits that I've been sitting on for a while now:
...
@ PooledGenotypeConcordance - changes to output, now also reports false-negatives and false-positives as interesting sites. It's been like this in my directory for ages, just never committed.
@NQSExtendedGroupsCovariantWalker - change for formatting.
@NQSTabularDistributionWalker - breaks out the full (window_size)-dimensional empirical error rate distribution by the window. So if you've got a window of size 3; the quality score sequences 22 25 23 and 22 25 24 have their own bins (each of the 40^3 sequences get one) for match and mismatch counts.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1730 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 19:35:50 +00:00
sjia
f7684d9e1b
ImputeAllelesWalker fills missing portions of HLA dictionary based on best allele matches
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1729 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 18:51:46 +00:00
sjia
235de38c2e
Updates to FindClosestAlleleWalker and CreateHaplotypesWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1728 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 16:41:58 +00:00
aaron
2b7d39035a
switched over the FastaAlternateReferenceWalker to the Variation system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1726 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 16:09:43 +00:00
aaron
7ffc1d97ef
Cut DeNovoSNPWalker over to the new Variation system, some renaming of methods on the Variation interface, and some corrections on the interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1724 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 04:35:52 +00:00
depristo
392152f149
1000x performance improvements to MSG for crisis control
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1723 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 23:44:33 +00:00
hanna
44879c81b0
Add in weights. Massive performance improvements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1722 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 23:19:15 +00:00
hanna
3b79f9eddc
Support 'N's and other mismatch characters in the reference.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1721 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 21:41:30 +00:00
hanna
08e8d2183a
Indels supported. Variable gap penalties are not yet taken into account.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1720 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 21:03:02 +00:00
aaron
d2af26e81f
Pooled EM SNP Rod converted over to the Variation interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1719 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 16:33:11 +00:00
ebanks
97105ac001
We need to return a null RODRecordList when the default value is null (as opposed to a list with a single null value), because that's what everyone is expecting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1718 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 16:23:12 +00:00
ebanks
d4b40bc06f
Filter for reads with missing read groups so we can safely assume all reads have valid read groups
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1717 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 16:10:26 +00:00
ebanks
90de2e0cde
Added ability to specify whether you want to use a point estimate or fair coin test calculation; for now you can use either but fair coin test is still experimental as it needs to be parametrized correctly. This job will hopefully be done by the future Bioinformatic Analyst...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1716 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 15:29:50 +00:00
aaron
d262cbd41c
changes to add VCF to the rod system, fix VCF output in VariantsToVCF, and some other minor changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1715 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 15:16:11 +00:00
sjia
1ee8ba590c
Reads cigar files
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1713 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 03:14:10 +00:00
sjia
9422156e09
Finds closest allele for each read in bam file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1712 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 03:12:20 +00:00
sjia
5c5151c4e7
Creates ped file from reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1711 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 02:48:29 +00:00
hanna
b0ec7fc144
More comprehensive testing of BWT (mismatches only) module, and lots of bug fixes.
...
Limitations:
1) Can't handle RC alignments.
2) Can't handle indels.
3) Can't handle N's in reference bases.
4) Stops at first hit.
Ran BWT over a test suite of 800k Ecoli reads. After removing alignments with indels / reads with Ns, the remaining reads were aligned with quality 'equal to' that of the alignment stored in the BAM file. In this case 'equal' quality is <= mismatches to the reference as the existing alignment stored in the BAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1710 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 23:44:59 +00:00
sjia
b446b3f1b6
CreateHaplotypeWalker now gives correct output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1709 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 21:13:52 +00:00
aaron
eeb14ec717
a couple of light changes to GenomeLocSortedSet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1708 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 20:38:53 +00:00
sjia
3916e165fb
New walker to output haplotypes for each read (for SNP analysis or imputation, etc)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1707 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 20:26:43 +00:00
ebanks
423a3ee894
Added a sequenom rod to empower Carrie to convert 1KG validation SNPs to sequenom format
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1706 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 20:22:09 +00:00
chartl
63f3d45ca4
fixing the build
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1705 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 20:04:09 +00:00
chartl
540e1b971f
And we fix one boneheaded mistake, which was actually causing the problem; though the last change was still correct.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1704 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 19:26:45 +00:00
chartl
124ca68fa8
And an IMMEDIATE minor fix (want neighborhood quality > base quality to be represented correctly)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1703 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 19:21:09 +00:00
chartl
8cdb78ebee
More sophisticated version of the NQSCovariantWalker - modified to be more explicit about how much higher the
...
quality score of a particular base is than the quality score of its neighbors. The granularity of the binning
jumps from 32 groups to 860 groups.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1702 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 19:18:24 +00:00
hanna
856bbd0320
Let Picard specify the default compression level.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1701 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 19:01:48 +00:00
aaron
f783cb30e0
adding an interface so that the current @Requires with ROD annotations work in walkers like VariantEval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1700 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:24:05 +00:00
hanna
ebfbe56b43
Make sure compression level always gets pushed into SAMFileWriterFactory.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1699 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:20:26 +00:00
asivache
fa87dd386d
Now uses rodRefSeq in its new reincarnation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1698 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:19:36 +00:00
asivache
bf7cd66d53
New, simpler rodRefSeq. Fully relies on the ROD system standard mechanisms. Multiple transcripts over a given location will be now returned by the ROD system itself as RodRecordList<rodRefSeq>; and yes, rodRefSeq does represent a single transcript record now and implements Transcript interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1697 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:18:25 +00:00
asivache
8fa4c93f5a
Transcript is now simply an interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1696 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:13:31 +00:00
asivache
fe36289e44
Noone needs this, probably... Old experimental code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1695 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:11:50 +00:00
asivache
1bd4c0077c
Now that ROD system supports overlapping RODs, we do not need rodRefSeq to be too smart and read in all the overlapping records (transcripts) on its own; leave it to the generic ROD mechanism.
...
PARTIAL commit; new, simpler rodRefSeq will reappear in a seq.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1694 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:11:16 +00:00
sjia
aa66074a0e
Compares each read to the HLA dictionary and outputs closest allele, as well as other stats
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1693 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 16:17:23 +00:00
aaron
11c32b588f
fixing VariantEvalWalkerIntegrationTest md5 sums, a couple comment changes, and a little bit of cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1690 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 20:54:47 +00:00
ebanks
0748d80baa
Added a convenience method in rodDbSNP to deal with Andrey's changes to the rod. Now you can just ask for the first real SNP rod from the list and not have to think about how it works.
...
CountCovariates uses it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1688 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 20:15:40 +00:00
hanna
14477bb48e
Unidirectional alignments with mismatches now working. Significant refactoring will be required.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1686 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 19:05:10 +00:00
sjia
22932042ea
Combined Scores, bug fixed for printing HLA-C
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1685 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 18:28:16 +00:00
ebanks
682b765536
bug: need to upper case chars so that == works throughout
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1684 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 18:20:43 +00:00
asivache
d7d0b270d1
now supports blacklisting lanes (with -BL option will ignore reads from any of the specified lanes)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1682 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 16:46:57 +00:00
asivache
57d31b8e9b
Filter that discards reads from specific lanes; and also its friend that helps blacklisting a set of lanes from GATK command line a one-liner.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1681 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 16:46:06 +00:00
aaron
83a9eebcc4
fixed a bug I checked in that Eric found, for intervals with no start or stop coordinate. Now I owe Eric a cookie, and Milk Street is so far away. Damn.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1679 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 04:34:18 +00:00
ebanks
5ce42cbab3
After thinking about this a bit more, it makes sense to pull this functionality out of my walker and into the GenomeLocParser where everyone else can benefit from it...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1677 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 01:32:35 +00:00
aaron
7bfb5fad27
fixing the dbSNP test. Also removing unnessasary comments from the GenomeLocParser, added some tests, and commented out the performance test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1676 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 23:32:24 +00:00
aaron
39a47491a9
changes to make GenomeLoc string parsing 25% faster
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1675 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 22:37:47 +00:00
ebanks
b1dc6d65e4
interval merging is now blazingly fast
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1674 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 21:15:04 +00:00
asivache
15135788ca
OK, let's bite the bullet. Now rodDbSNP objects are 'isSNP()' only when they are annotated as 'exact', not a 'range'.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1673 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 19:25:16 +00:00
asivache
8ad181f46f
Note to myself: do 'ant clean' now and then or old versions of the code that suddenly became invalid will stick around. The world is not perfect, and neither is automatic dependency resolution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1672 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 17:40:52 +00:00
asivache
fb09835ef8
Changed to accomodate new ROD system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1671 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 17:10:56 +00:00
asivache
d2d1354199
Now uses BrokenRODSimulator class to pass the test. CHANGE the code to use new ROD system directly and MODIFY MD5 in corresponding tests, since a few snps are seen differently now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1670 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 17:03:49 +00:00
asivache
f4d270cba4
These classes now use BrokenRODSimulator class to pass the test. CHANGE the code to use new ROD system directly and MODIFY MD5 in corresponding tests, since a few snps are seen differently now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1669 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 17:03:15 +00:00
asivache
29adc0ca1c
Little class that can be used to simulate the results returned by the old ROD system. This is needed to keep couple of tests from breaking. All the code that uses this class must be changed urgently to accomodate the data as returned by new ROD system, and the corresponding tests (MD5 sums) have to be modified as well since some data as seen through the new ROD system is indeed different.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1668 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 16:58:56 +00:00
asivache
a6bd509593
Changing the carpet under your feet!! New incremental update to th eROD system has arrived.
...
all the updated classes now make use of new SeekableRodIterator instead of RODIterator. RODIterator class deleted. This batch makes only trivial updates to tests dictated by the change in the ROD system interface. Few less trivial updates to follow. This is a partial commit; a few walkers also still need to be updated, hold on...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1667 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 16:55:22 +00:00
asivache
4c67a49ccb
Removed unused imports
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1666 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 16:45:22 +00:00
hanna
e7f44ada98
Make unpackList public static so that Doug can use it in the scatter/gather framework.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1665 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 15:32:49 +00:00
ebanks
7b627fd622
Check for empty interval lists to merge
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1664 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 04:34:26 +00:00
hanna
7f5778c966
Update gsadevelopers -> gsahelp.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1663 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-20 23:36:54 +00:00
aaron
3a487dd64e
little fixes; also fixed a tyPo
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1662 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 22:38:51 +00:00
aaron
b6d7d6acc6
fix for the eval tests, and a change to the backedbygenotypes interface, more changes to come
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1661 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 22:25:16 +00:00
depristo
4318f75910
tiny cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1660 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 21:04:25 +00:00
depristo
3a341b2f06
Fixes for VariantEval for genotyping mode
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1659 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 21:01:43 +00:00
aaron
7b39aa4966
Adding the VCF ROD. Also changed the VCF objects to much more user friendly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1658 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 20:19:34 +00:00
sjia
83e6e5a3e4
Calculates Probability for each allele combination (using likelihood score and allele frequencies only)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1656 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 18:46:38 +00:00
ebanks
b19fd4d45c
Damn unit tests have a null Toolkit()...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1654 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 17:10:49 +00:00
ebanks
90626c843d
oops - we don't need reference bases, but we still need reference
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1653 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 16:24:45 +00:00
ebanks
2b2df4e1ba
- Fix the CleanedReadInjector to deal with -L intervals correctly.
...
- Some walkers don't use the ref base, so speed up traversals by not requiring it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1652 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 16:17:58 +00:00
ebanks
7da9ff2a9e
Put back the check that both chip and variant are not null.
...
Also, sanity check that ref is not 'N'.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1651 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 16:03:54 +00:00
asivache
94618044e8
Starting an update of ROD system. These basic classes will completely replace old ones, but with this update they are not linked to anything, so this checkpoint should be safe.
...
The main reason for the change is that there can be (and are!) multiple RODs overlapping with a single reference base position in a single track. There can be two "trivial" RODs at the same location (e.g. samtools pileup will have two point-like records at putative indel sites: one for the reference, the other one for the indel itself). Or there can be one or more "extended" RODs (length >1), eg. dbSNP can report an indel at Z:510-525 AND a SNP at Z:515.
The ReferenceOrderedDatum object (and children) will not be changed, but it is now explicitly interpreted as a single data *record*, possibly out of many available from a given track for the current site. As long as single data record occupies one line in a data file, the new ROD system will take care of loading and keeping multiple records, including extended (length > 1) ones, and will automatically drop the records when they finally go out of scope. For one-line-per-record, multiple-records-per-site RODs, there is no need anymore for the hack used so far that involved passing ROD's own implementation of iterator through reflection mechanism (though it will still work)
* RODRecordList:
the ROD system (its iterators) will now always return a LIST of all RODs available at current position or at current query interval (see below). This class is a trivial wrapper for a list of ROD objects, with added location argument for the whole collection. The location of the RODRecordList is where the ROD system is currently sitting at: a single, current base on the reference (if next() traversal is performed), or the location of the query interval when returned by seekForward() (see below). The ROD objects themselves will have their locations set according to the original data in the file. Hence, perusing the above example of a dbSNP indel at Z:510-525 and SNP at Z:515, when moving to the position Z:515 the ROD system will return a RODRecorList with location Z:515, and with two ROD objects packaged inside, one with location Z:510-525, the other with Z:515.
*RODRecodIterator:
Almost identical to old SimpleRODIterator used by ReferenceOrderedData; this is a low-level iterator that walks over records in the data file (with a callback to ROD's ::parseLine() to parse real data)
*SeekableRODIterator:
a decorator class that wraps around Iterator<ROD> (such as RODRecordIterator) and makes the data traversable by reference position, rather than record by record. This is reimplementation of the old RODIterator. SeekableRODIterator's ::next() moves to the next position on the ref and returns all RODs overlapping with that position (as a RODRecordList). This iterator also adds a seekForward(loc) operation, that allows fast forwarding to a specified position or interval. Length > 1 query arguments (extended intervals) are fully supported by seekForward(), the returned RODRecordList wil contain all RODs overlapping with the specified interval, and the location of the returned RODRecordList object will be set to that query interval. NOTE: it is ILLEGAL to perform next() after a seekForward() query with length > 1 interval. seekForward() with point-like (length=1) interval reenables next().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1650 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 15:58:37 +00:00
ebanks
66a4de9a1d
Genotype check should be case-insensitive
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1649 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 03:23:30 +00:00
hanna
c186a49d55
Time for a reorganization. Repackage generally useful alignment classes lower in the package structure, and create a subpackage for bwa-specific code. Repackage BWA alignment code away from BWT representation. Isolate byte- and word-packing streams in another package that will ultimately be killed off en masse.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1648 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-17 23:28:47 +00:00
hanna
b4df089b59
Putting some of the required data structures together for imperfect lookup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1647 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-17 22:43:11 +00:00
hanna
355136928e
Play nice with other jobs in this VM -- don't close stdout / stderr.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1646 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-17 18:55:08 +00:00
sjia
0e73b2ba8e
Use population allele frequencies to distinguish between top candidates
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1645 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-17 15:49:19 +00:00
chartl
534486a254
Output formatting changed:
...
- summary output now reported as a percentage rather than proportion; 2 sigfigs
- fixed minor bug where FNR was calculated over total calls rather than total variant sites
- column headers are_now_contiguous_strings
- spacing fixed
- "No Call" separated from "Ref Call" as its own column
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1644 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-17 14:00:25 +00:00
depristo
73bec6f36d
Now uses expanding array list for coverage histograms. No hard limit on maximum depth now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1643 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 23:27:25 +00:00
chartl
4ad46590a3
Changes to PooledGenotypeConcordance:
...
Additional output & better output formatting. It has now undergone a good five hours of testing; and for pools of size 1 outputs exactly the same statistics as GenotypeConcordance (when GenotypeConcordance is modified to do nothing on reference='N'); and for pools of many sizes outputs close to the expected (by genetics) statistics. Looks like this is working properly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1642 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 21:45:01 +00:00
chartl
386a6442ba
Actually deleted now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1641 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 20:28:06 +00:00
chartl
8fce376792
Changes:
...
Deletion: PooledGenotypeConcordanceNew
Rewrite: PooledGenotypeConcordance. It works, and is blazing fast compared to the earlier version (1 order of magnitude speedup)! And is now entirely non-hackey, as opposed to before when there were some hacky bits.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1640 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 20:22:16 +00:00
asivache
3e289fcaa4
A little piece that PairMaker needs in order to compile ;)
...
Iterates synchronously over two (name-ordered) single-end alignment SAM files with, possibly, multiple alignments per read and for each read name encountered returns pairs<all alignments for end1, all alignments for end2>
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1639 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 19:17:40 +00:00
asivache
2f29cf59ba
Very early, half-baked version. All it can do right now is to take two SAM files with end1 and end2 individual single-end alignmnets from a pair-end run and spit out a "paired" BAM file that contains ONLY properly paired ends (both ends align uniquely && both ends align to the same chromosome && the ends align in proper orientation). Insert size is currently not used (and not set in the output). Unpaired/unmapped reads are NOT transferred into the output bam. For the pairs that do get written, the output is (should be) standard-conforming: all flags are properly set and mate pair information is correct.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1637 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 18:38:18 +00:00
ebanks
5d85bd9671
By default, VF should ask for deleted bases so that they show up in coverage.
...
The Strand filter then needs to ignore those bases when determining bias.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1636 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 16:46:09 +00:00
ebanks
a7c306f757
-deal with offsets that can be -1
...
-added option to have "D"s inserted for deleted bases in pileup strings
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1635 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 16:44:57 +00:00
hanna
01a9b1c63b
Fix for problem where err stream remapped to output stream in certain cases, (hopefully) completing Matt's hat trick of fail. Thanks, unit tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1634 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 08:33:56 +00:00
chartl
f6bdb47bb6
Addition:
...
@PooledGenotypeConcordanceNew - a new version of the pooled genotype concordance test for Variant Eval. Code altered to be more extensible, use a private class for handling the count tables so it doesn't gunk up the code in the test itself, and for easy debugging. The hackier methods from the original were rewritten properly. Currently computes more statistics that it outputs. Code compiles, is never called by anything, and breaks none of the tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1632 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 04:14:58 +00:00
aaron
542d817688
more cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1631 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 21:42:03 +00:00
hanna
9f7cf73411
Output stream management fixes. I completely screwed up the output stream management system, but cleverly masked this fact by breaking some other stream management functionality that masked the problem.
...
Sigh.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1630 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 21:06:45 +00:00
hanna
17758b381c
Properly initialize redirected output streams in case of out and err.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1629 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 19:47:43 +00:00
andrewk
00dfe014b7
Added option to FastaReferenceWalker to change output FASTA file format's line width and to remove header lines; allows dumping raw sequence using intervals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1628 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 18:00:30 +00:00
hanna
b69eb208a6
Always create output files, even if no output was written to them.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1627 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 17:58:14 +00:00
aaron
b401929e41
incremental clean-up and changes for VariantEval, moved DiploidGenotype to a better home, and fixed a spelling error.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1624 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 04:48:42 +00:00
andrewk
fb254759cb
Trivial: Don't print reduce result
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1621 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 23:42:20 +00:00
hanna
118071cfd8
Proof-of-concept perfect read aligner, implemented as described in sec 2.4 of BWA paper. Has successfully aligned a handful of reads. Requires significant cleanup and refactoring.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1617 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 21:54:56 +00:00
ebanks
01e7b39c8d
1. Don't print out values in filter field of the VCF.
...
2. Fix ratio printouts (for params file)
3. Rename ratio filter's get counts method to avoid confusion; more changes on the way this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1616 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 21:03:39 +00:00
ebanks
436f543b3b
I owe Doug a beer for finding this:
...
don't print out intervals to be merged if they're not within the global -L intervals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1615 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 20:22:30 +00:00
chartl
7d6d114ab5
Additions:
...
@NQSMismatchCovariantWalker - Walks along the gene calculating the table
# NQS
# Q score
# mismatches at non-dbsnp sites
# total number of bases at non-dbsnp sites
And prints it out at the end.
Changes:
@PooledGenotypeConcordance now works. Takes a path to a file listing a bunch of hapmap IDs in whatever pool we want to check, reads those in, and checks for concordance by name.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1614 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 20:12:04 +00:00
sjia
9be1832d7b
Phasing version 1
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1613 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 16:10:37 +00:00
asivache
a009592662
the life in the magical kingdom of fully spec-conforming SAM files would be so... magical. For now, however, there are plenty of ways to end up with inconsistent SAM records. For instance, a SAM file with missing header will result in SAM records with ref. name set, but getReferenceIndex() returning null. This, in turn, was tripping isReadUnmapped(). The method is now fixed, so that it suffices to have *either* reference name *or* reference index set for the read to be considered mapped (the flag is still checked)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1612 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 16:04:19 +00:00
aaron
e03fccb223
Changes to switch Variant Eval over to the new Variation system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1611 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 05:34:33 +00:00
aaron
5b41ef5f70
rod DBSNP had a bug where the reference wasn't calculated correctly under certain conditions. Fixed getRefBasesFWD and getRefSnpFWD so that they were more in line with getAltBasesFWD and getAltSnpFWD. Also updated Variant Eval tests to reflect this change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1609 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-13 23:48:58 +00:00
chartl
5cf1d6c104
Bugfix - this walker was never changed to work with the new PoolUtils methods after those methods were changed to return ReadOffsetQuad objects rather than nested pairs. This broke the build :(.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1608 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-13 19:39:23 +00:00
ebanks
c669e8d5ad
Use constant seed in the random generator so we can be stable (and thus unit tests will work)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1607 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-13 17:40:56 +00:00
ebanks
15178977e1
Naive tool to convert from vcf to geli text
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1606 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-13 17:25:02 +00:00
chartl
794bd26b20
Changed some ShortNames so they made more sense.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1604 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-13 01:32:12 +00:00
chartl
b353bd6f81
Added a Quad toString() method.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1603 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-13 01:13:57 +00:00
chartl
2e237a12e9
This commit has a bunch to do with cleaning up the CoverageAndPowerWalker code: implementing some new printing options,
...
but mostly altering the code so it's much more readable and understandable, and much less hacky-looking.
ADDED:
@Quad: This is just like Pair, except with four fields. In the original CoverageAndPowerWalker I often used
a pair of pairs to hold things, which made the code nigh unreadable.
@SQuad: An extension of Quad for when you want to store objects of the same type. Let's you simply declare
new SQuad<X> rather than new Quad<X,X,X,X>
@ReadOffsetQuad: An extension of Quad specifically for holding two lists of reads and two lists of offsets
Supports construction from AlignmentContexts and conversion to AlignmentContexts (given
a GenomeLoc). There are methods that make it very clear what the code is doing (getSecondRead()
rather than the cryptic getThird() )
@PowerAndCoverageWalker: The new version of CoverageAndPowerWalker. If the tests all go well, then I'll remove
the old version. New to this version is the ability to give an output file directly
to the walker, so that locus information prints to the file, while the final reduce
prints to standard out. Bootstrap iterations are now a command line argument rather
than a final int; and users can instruct the walker to print out the coverage/power
statistics for both the original reads, and those reads whose quality score exceeds
a user-defined threshold.
CHANGES:
@PoolUtils: Altered methods to accept as argumetns, and return, Quad objects. Added a random partition method
for bootstrapping.
@CoverageAndPowerWalker: Altered methods to work with the new PoolUtils methods.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1602 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-13 01:00:04 +00:00
depristo
6c7a300664
Missing file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1601 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:17:09 +00:00
depristo
6e13a36059
Framework for ROD walkers -- totally experiment and not working right now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1600 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:13:15 +00:00
depristo
bd75a8d168
Unused code has been removed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1599 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:12:23 +00:00
depristo
e8d544869d
Alignment context now supports the idea of skipped bases -- not currently in use
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1598 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:11:38 +00:00
depristo
3ad97e4ab4
Easier to print GenomeLoc compareTo()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1597 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:10:35 +00:00
depristo
3949b4ac72
commented out version of next() and hasNext() that appear to be correct but are causing testing problems
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1596 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:09:21 +00:00
depristo
58105636c8
getBoundRods() convenience method
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1595 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:07:57 +00:00
depristo
4e1eded389
Fixed bad compareTo operator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1594 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:07:10 +00:00
depristo
17ab1d8b25
General purpose merging iterator implementation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1593 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:06:15 +00:00
hanna
275707f5f6
Data structure for counts, to isolate the user from wonky 'sometimes counts are cumulative, other times base-by-base' gotchas.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1592 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 20:53:24 +00:00
depristo
7c8b17b456
fix for SSG with pl name
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1591 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 20:39:34 +00:00
andrewk
5354c1876c
De Novo SNP caller as presented at 1KG meeting on 9/10/09 with min LOD 5 calls required from both parents and a LOD 5 call in the daugter gold standard concordant call set. All SNP calls must be present as bound RODs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1590 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 19:30:23 +00:00
hanna
0f3049652a
Start to build BWT abstractions, so we can present a reasonable facsimile of the BWT to the user no matter how it's represented on disk.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1589 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 18:23:15 +00:00
chartl
c3f77acd5e
Alteration to CoverageAndPowerWalker. It can now be flagged with -uc which will cause it to print not only the coverage on each strand that exceeds the quality score threshold, but also the total coverage on each strand as well.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1588 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 17:55:44 +00:00
chartl
d6a0b65ac9
Changes:
...
Rollback of Variant-related changes of r1585, additional PGC code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1586 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 16:23:01 +00:00
chartl
0c54aba92a
Changes:
...
@VariantEvalWalker - added a command line option to input a file path to a pooled call file for pooled genotype concordance checking. This string is to be passed to the PooledGenotypeConcordance object.
@AllelicVariant - added a method isPooled() to distinguish pooled AllelicVariants from unpooled ones.
@ all the rest - implemented isPooled(); for everything other than PooledEMSNProd it simply returns false, for PooledEMSNProd it returns true.
Added:
@PooledGenotypeConcordance - takes in a filepath to a pool file with the names of hapmap individuals for concordance checking with pooled calls
and does said concordance checking over all pools. Commented out as all the methods are as yet unwritten.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1585 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 15:01:50 +00:00
ebanks
e24c8d00d5
So, the VCF spec allows for an optional meta field in the header representing the date. However, using this field means that integration tests run on the vcf file will fail the MD5 test (which is what happened to the VariantFiltration test this morning after working just fine yesterday).
...
After consulting our resident expert (Aaron), we're going to (temporarily) remove the date from the vcf output until we can come up with a better solution. However, this shouldn't cause any short-term problems because the data truly is optional.
VF test's MD5s are updated.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1580 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 14:28:43 +00:00
aaron
296878e8e3
adding a basic implementation of the Variation interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1578 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 04:41:13 +00:00
aaron
5a64a80ab5
changes to the variation class, updates to SSG, updated tests based on changes to the SSGenotypeCall, and added the ability to run a single integration test from using the build script.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1577 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 04:31:33 +00:00
depristo
c988205884
Notes for Aaron in SSG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1576 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 03:18:51 +00:00
ebanks
1362a56227
Added fasta tests and small fix to cleaner test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1575 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 03:13:11 +00:00
hanna
6de54dcd2a
Higher-level readers and writers for BWTs and suffix arrays.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1573 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 22:45:32 +00:00
depristo
0093482c62
N reference base fix for SSG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1572 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 21:19:36 +00:00
hanna
bc9fe31cf5
Cleanup of int-packed file readers / writers. All primitive writers for BWTs and SAs are in place; time to move on to compound reader / writers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1571 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 20:36:39 +00:00
asivache
d9f3e9493f
Does not return 0-length cigar elements anymore (used to do so when previous cigar element ended exactly at the segment boundary)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1570 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 20:05:55 +00:00
ebanks
cb31d5a0ab
VariantFiltration now outputs VCF. Important changes:
...
1. VariantsToVCF can now be called statically to output VCF for a single ROD instance; this is temporary until we have a VCF ROD.
2. VariantFiltration now outputs only 2 files, both mandatory: all variants that pass filters in geli text, and all variants in VCF.
If there are any problems, go find Aaron.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1569 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 20:04:32 +00:00
asivache
dd0085c428
1) now is tolerant to sloppy cigar strings with 0-length elements (at the price of extra recursive call)
...
2) when reads with deletions are requested, adds to the pile just those: reads with 'D' over the current reference base, but not 'N'
3) next() now implements a loop: recursive forward iteration calls to next() until ref. position with non-zero coverage is encountered were OK for (short) deletions, but with long stretches of N's they end up with stack overflow
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1568 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 20:04:04 +00:00
ebanks
542af6402e
output correct format for Sequenom SNPs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1567 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 19:21:53 +00:00
hanna
43d1c6741c
Cleanup. Separate common packing functionality into utils class. Make base packing utility as generic as possible.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1566 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 17:54:12 +00:00
kiran
3b1e966b4c
Lowercases the sequencing platform so that a difference in case doesn't lead to the failure to look up an entry in the hash.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1565 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 17:35:45 +00:00
kiran
d82d6c0665
Excludes variants that fall below a certain LOD that changes as a function of depth.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1564 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 17:34:16 +00:00
kiran
06eae52292
Throws an exception if you attempt to use a filter that doesn't exist.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1563 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 17:33:27 +00:00
asivache
1060b36288
Bug fix: 'N' cigar elements now treated properly; for all practical intents and purposes, N is the same as D and should be treated as such, the difference is only in logical interpretation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1562 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 17:08:35 +00:00
chartl
9c7f456510
Changed the short name on the PoolSize cmd line argument
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1560 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:53:22 +00:00
chartl
9d69bd2c84
Modifications:
...
@CoverageAndPowerWalker - removed a hanging colon that was being printed after the reference position
@VariantEvalWalker - added a command line argument for pool size for eventual use in doing pooled caller evaluations. As now, the variable is unused.
@AlignmentContext - altered the scope of class variables from private to protected in order that child objects might have access to them
New Additions:
Filtered Contexts
Sometimes we want to filter or partition reads by some aspect (quality score, read direction, current base, whatever) and use only those reads as
part of the alignment context. Prior to this I've been doing the split externally and creating a new AlignmentContext object. This new approach makes
it a bit easier, as each of these objects are children of AlignmentContext, and can be instantiated from a "raw" AlignmentContext.
@FilteredAlignmentContext is an abstract class that defines the behavior. The abstract method 'filter' is called on the input AlignmentContext, filtering
those reads and offsets by whatever you can think of. The filtered reads/offsets are then maintained in the reads and offsets fields. These classes can
be passed around as AlignmentContexts themselves. Writing a new kind of read-filtered alignment context boils down to implementing the filter method.
@ReverseReadsContext - a FilteredAlignmentContext that takes only reads in the reverse direction
@ForwardReadsContext - a FilteredAlignmentContext that takes only reads in the forward direction
@QualityScoreThresholdContext - a FilteredAlignmentContext that takes only reads above a given quality score threshold (defaults to 22 if none provided).
A unit test bamfile and associated unit tests for these are in the works.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1559 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:49:52 +00:00
depristo
d9588e6083
bug fixes to LIBS and LIBH following ultra-aggressive regression testing across 454, solid, and solexa
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1558 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:36:12 +00:00
asivache
0721c450c2
Bug fix: single unmapped read now keeps mapping qual 0 after remapping, not 37!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1557 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:29:34 +00:00
asivache
df11618092
Set default value of useLocusIteratorByHanger to FALSE. Otherwise the -LIBH flag is useless and there'd be no wayto "unset" the 'true' value. Old version was (always) using LocusIteratorByHanger. Now default iterator is indeed LocusIteratorByState, and -LIBH will switch back to the old one
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1556 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:09:09 +00:00
depristo
eeb9b6eb13
GenotypeLikelhoods now support a cache per subclass, avoiding genotyping clashes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1554 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 10:39:14 +00:00
ebanks
0cc219c0df
-Added unit test for walkers dealing with intervals for cleaning
...
-I also uncovered a corner case in the cleaner that for some reason was commented out but shouldn't have been. Hooray for unit tests!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1553 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 02:35:17 +00:00
depristo
ec0f6f23c7
LocusIterationByState is now the system deafult. Fixed Aaron's build problem
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1552 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 01:28:05 +00:00
aaron
ea6ffd3796
initial VariantEvalWalker test. More to be added soon...
...
Also fixed the case where MD5 sums had leading zero's clipped off
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1551 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 01:02:04 +00:00
hanna
adce3bd536
My reference implementation is now generating a BWT which matches BWT-SW's.
...
Note to self: never give project status in an svn log.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1550 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-08 22:11:03 +00:00
hanna
f22f590192
Successfully writing .sa files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1549 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-08 17:34:34 +00:00
sjia
600c234643
Starting code on phasing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1548 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-08 15:20:38 +00:00
aaron
3276e01e5f
fixing the build
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1546 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-08 13:13:55 +00:00
kiran
f963cfcb21
Made enum listing header fields public.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1545 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-08 06:12:59 +00:00
kiran
fd20f5c2e8
For a file or files backed by a ROD implementing AllelicVariant, outputs a VCF file summarizing the information. Metadata like Hapmap and dbSNP membership, genotype LOD, read depth, etc, are annotated appropriately. The results output by this program are equivalent to those given by Gelis2PopSNPs.py.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1544 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-08 06:12:18 +00:00
ebanks
4a95f2181d
print out the right variant
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1543 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-08 01:37:35 +00:00
sjia
5791da17ae
Updated to reference HLA database of unique 4 digit alleles
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1542 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-07 22:12:56 +00:00
ebanks
5dbba6711c
Lots of changes: (I'll send email out in a sec)
...
1) Moved various disparate concordance / set splitting functionalities to a new parent tool which works like VariantFiltration (i.e. people can write various modules that fit inside and can be run though it).
2) Fixed up argument parsing in VariantFiltration to use key=value format so we don't accidentally mox up values (like I had been doing).
3) Have indel rod print samples
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1540 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-07 01:12:09 +00:00
depristo
1c3d67f0f3
Improvements to the CountCovariates and TableRecablirator, as well as regression tests for SLX and 454 data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1539 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 22:26:57 +00:00
depristo
2b0d1c52b2
General WalkerTest framework. Includes some minor changes to GATK core to enable creation of true command-line like GATK modules in the code. Extensive first-pass tests for SSG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1538 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 19:13:37 +00:00
sjia
471ca8201e
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1537 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 19:12:46 +00:00
aaron
0cc634ed5d
-Renamed rodVariants to RodGeliText
...
-Remove KGenomesSNPROD
-Remove rodFLT
-Renamed rodGFF to RodGenotypeChipAsGFF
-Fixed a problem in SSGenotypeCall
-Added basic SSGenotype Test class
-Make VCFHeader constructors public
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1536 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 18:40:43 +00:00
ebanks
fd1c72c151
Fixed package name
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1535 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 15:40:06 +00:00
ebanks
6c476514f8
Moved to core. Wiki pages are going up; unit tests will be written soon.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1533 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 15:09:11 +00:00
ebanks
42c71b4382
Fix for Kris: now SNPs aren't masked by default (only when they come from a mask rod) and we can design Sequenom validation assays for them.
...
I'll move this all to core in a bit...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1532 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 14:52:06 +00:00
ebanks
849dce799d
This rod was all wrong for generating the alternate snp alleles (it returned null or even the wrong value); fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1531 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 14:21:46 +00:00
depristo
a08c68362e
Renaming error to getNegLog10PError(); added Cached clearing method to GL; SSG now has a CallResult that counts calls; No more Adding class to System.out, now to logger.info; First major testing piece (and general approach too) to unit testing of a walker -- SingleSampleGenotyper now knows how many calls to make on a particular 1mb region on NA12878 for each call type and counts the number of calls *AND* the compares the geli MD5 sum to the expected one!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1530 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 12:39:06 +00:00
aaron
3c2ae55859
changes for the genotype overhaul. Lots of changes focusing on the output side, from single sample genotyper to the output file formats like GLF and geli. Of note the genotype formats are still emitting posteriors as likelihoods; this is the way we've been doing it but it may change soon.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1529 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 05:31:15 +00:00
ebanks
2241173fff
In order to help learn python, I decided to convert Michael's DoC python script to Java; the CoverageHistogram now spits out standard deviations for a good Gaussian fit.
...
This code eventually needs to end up in the VariantFiltration system - when we are ready to parameterize on the fly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1528 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 02:23:57 +00:00
chartl
544900aa99
Migration of some core calculations (log-likelihood probabilties, etc.) from CoverageAndPowerWalker into static methods in PoolUtils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1527 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 21:43:29 +00:00
chartl
93cedf4285
---------------
...
| Added items |
---------------
@/varianteval/PoolAnalysis
Interface to identify variant analyses that are pool-specific.
@/varianteval/BasicPoolVariantAnalysis
Nearly the same as BasicVariantAnalysis with the addition of a protected integer (numIndividualsInPool)
which holds the pool size. One soulcrushing change is that "protected String filename" needed to
become "protected String[] filename" since now multiple truth files may be looked at. It was tempting
to make the change in BasicVariantAnalysis with some default methods that would maintain usability of
the remainder of the VariantAnalysis objects, but I decided to hold off. We can always merge these
together later.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1526 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 21:26:04 +00:00
sjia
ee06c7f29f
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1525 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 19:41:12 +00:00
sjia
043c97eede
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1524 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 19:34:42 +00:00
aaron
c849282e44
reverting the HLA walker changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1523 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 19:11:57 +00:00
asivache
5202d959bf
NM attribute changed in sam jdk (?) from Integer to Short, or maybe it is presented differently by the reader depending on whether SAM or BAM is processed; in any case, both Integer and Short are safe now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1522 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 19:03:32 +00:00
sjia
ada4c5a13c
Small change to debug printing code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1521 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 18:31:21 +00:00
kiran
c3aaca1262
Improvements to make this work with uncompressed fastq files. Pulled the fastq parser out into it's own SAMFileReader-like entity.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1520 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 17:20:16 +00:00
asivache
499b3536a4
Changed to use AlignmentUtils.isReadUnmapped() for better consistency with SAM spec; also, it is now explicitly enforced that unmapped reads have <NO_...> values set for ref contig and start upon "remapping"
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1519 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 16:45:07 +00:00
ebanks
5bd99fc1c4
VariantFiltration moved to core.
...
Another win for the team.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1517 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 15:41:41 +00:00
chartl
5130ca9b94
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1516 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 15:17:02 +00:00
depristo
bdd0a6f9fa
change to make build work
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1511 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 13:43:10 +00:00
depristo
b01ac9de0c
High performance LocusIterator implementation. Now with greatly reduced memory impact and 2x (and more potentially) speed ups of raw locus iteration. General performance improvements to SSG with empirical probs. You can enable high-performance locus iteration with the -LIBS arg. It's still testing but passes validing pileup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1510 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 03:06:25 +00:00
jmaguire
e2780c17af
Checkin of the Multi-Sample SNP caller.
...
Doesn't work yet; same command I used to use now causes GATK to throw an exception.
Will check with Matt & Aaron tomorrow, then do a regression test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1509 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 00:23:28 +00:00
hanna
e2a79c5cd9
Checkpoint. The BWT that we generate now matches the first 16% of the BWT that BWT-SW generates. Cleaned up output streams to separate the byte packing / word packing from the data structure generation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1508 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 22:18:17 +00:00
ebanks
3dfc77dc89
Add an indel rod which represents the initial point of the indel only
...
(useful for alternate reference making)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1507 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 19:32:29 +00:00
asivache
58debd7e56
A convenience shortcut isReadUnmapped() added: thanks to SAM format specification, 'read unmapped' flag is not always required to be set for an unmapped read; this method checks both the flag and the alignment reference index/start (if those are set to '*' the flag is not required according to the spec!)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1506 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 17:00:39 +00:00
aaron
0e6feff8f2
fixed locus pile-up limiting problem
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1505 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 16:56:44 +00:00
hanna
d8aff9a925
Bug fixes. Was ignoring the '$' character in a few places where I shouldn't have been.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1504 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 16:27:31 +00:00
ebanks
55013eff78
Re-revert back to point estimation for now. We need to do this right, just not yet.
...
Also, it's safer to let colt do the log factorial calculations for us.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1503 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 15:33:18 +00:00
hanna
1ada085970
Cruddy implementation of BWT creation, for understanding and testing purposes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1501 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 02:16:56 +00:00
ebanks
24d809133d
Oops - comment out the printouts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1500 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 01:45:56 +00:00
ebanks
91ccb0f8c5
Revert to having these filters use integration over binomial probs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1499 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 01:40:22 +00:00
aaron
05c164ec69
changing the default behavior to allow any sized read pile-up (which may exceed the memory limit); the user can then select their own read limit. The default of 100K was arbitrary.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1498 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-01 14:46:00 +00:00