rpoplin
cea544871d
Fixed an issue with recalibrating original quality scores above Q40. There is a new option -maxQ which sets the maximum quality score possible for when a RecalDatum tries to compute its quality score from the mismatch rate. The same option was added to AnalyzeCovariates to help with plotting q scores above Q40. Added an integration test which makes use of this new -maxQ option.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2534 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 13:50:30 +00:00
ebanks
6c739e30e0
1. Removing an old version of the Genotype interface which is no longer being used. Needed to do this now so that the naming conflicts would cease.
...
2. Adding a preliminary version of the new Genotype/Allele interface (putting it into refdata/ as the VariantContext really only applies to rods) with updates to VariantContext. This is by no means complete - further updates coming tomorrow.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2533 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 05:51:10 +00:00
depristo
a9245a58e2
Fix for incorrect exception throwing in VCFRecord. It is reasonable to ask for the non-ref allele freq at all ref sites. Was only passing in tests because isReference was broken
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2532 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 01:18:30 +00:00
depristo
7215526810
Fix to isReference() in VCFRecord. Change to VariantCounter to correctly counter only non-genotype variants, as well as update to VariantEvalWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2531 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 00:03:29 +00:00
andrewk
6c4ac9e663
Updated HapMap2VCF to use the VCFGenotypeWriterAdapter interface; fixed bug in VCFParameters that affects VariantsToVCF and HapMap2VCF when reference is lower-cased; added integration test for HapMap2VCF that checks for the lower-case issue by testing against Hg18 region that has lower-cased bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2530 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 21:27:11 +00:00
aaron
576594eda2
clean-up of the GATK paper genotyper, and better output formatting for the simple call format we emit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2529 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 20:54:56 +00:00
chartl
7e3e714d3c
Moving experimental annotations from core to oneoffs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2528 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 19:34:10 +00:00
chartl
a32245f7d2
Modifications:
...
QualityUtils - Stole the BaseUtils code for flipping reads around and applied it to quality scores
SecondBaseSkew - Nothing's really different, just a commented line
Additions (experimental annotations for future development of second-base annotation)
** I DO NOT INTEND FOR ANYONE TO USE THESE **
- ProportionOfNonrefBasesSupportingSNP
- ProportionOfSNPSecondBasesSupportingRef
- ProportionOfRefSecondBasesSupportingSNP
+ I hope these are self-explanatory
- QualityAdjustedSecondBaseLod
+ Adjust lod-score by 10*log10[P[second bases are as observed]]
Added walker:
QualityScoreByStrand - oneoff project that's being saved if i ever need it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2527 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 19:18:07 +00:00
asivache
eb899741e1
reverting last changes. no cacheing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2526 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 18:59:37 +00:00
asivache
a17d725c35
Cache pileup bases and mapping quals after first call to getBases() and getMappingQuals(), respectively. Subsequent calls to these method will return cached arrays.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2525 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 18:05:00 +00:00
ebanks
d6fb19bb67
Don't hard-code base qual max
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2524 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 17:21:44 +00:00
rpoplin
75809100c6
Use inheritance so that shared code isn't duplicated between the RecalDatums
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2523 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 16:45:16 +00:00
ebanks
fdd14e1a01
Proposed interface for VariantContext. It's currently an interface so it doesn't break the build...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2521 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 16:31:39 +00:00
rpoplin
e011a1b6f8
Cut the memory footprint of the RecalDatum in half to improve performance of CountCovariates when run with many covariates.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2520 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 16:12:27 +00:00
rpoplin
370a365147
Small runtime improvement in TableRecalibration.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2519 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:51:12 +00:00
ebanks
b745c2f8d7
Fix for Jared: don't blow up if there are no samples in the input (since that's allowed) - but warn the user just in case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2518 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:37:06 +00:00
depristo
1e462419da
trivial code restructuing, and commented out failed attempt to support sample selection with VCF. VariantEval2 go go go
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2516 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:04:27 +00:00
depristo
f857159343
useful convenience function to get a genotype associated with a particular sample
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2515 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:03:07 +00:00
depristo
34519b3e3b
Better printing support for false positives and false negatives in concordance tables
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2514 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:02:40 +00:00
depristo
592749a7c1
isNBase method
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2513 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:01:51 +00:00
depristo
5ce11c3dad
toString method
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2512 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:01:20 +00:00
rpoplin
1c90e6a954
More informative error message in AnalyzeCovariates and cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2511 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 14:56:29 +00:00
depristo
bca3d1b943
useful convenience function to get a genotype associated with a particular sample
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2510 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 14:53:56 +00:00
depristo
ec774f62be
Some checking to protect the BasicGenotype
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2509 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 14:53:24 +00:00
rpoplin
71ecbe75d7
AnalyzeCovariates would crash with 'too many open files' exception when spawning Rscript jobs for every read group at once. It now waits for some to finish before spawning the rest.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2508 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 14:19:02 +00:00
depristo
21a50eedb5
Simple extension to VariantEval: --includeFilteredRecords will now keep filtered VCF records so you can see what the entire call set looks like. Looking forward to VariantEval v2 from Kiran.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2506 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 12:59:09 +00:00
depristo
8d13597a27
Temporary command-line support to enable rod walkers, if you know what you are doing this is safe.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2505 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 12:15:36 +00:00
ebanks
d8351cb9fc
Give Annotations access to rod data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2504 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-05 18:53:01 +00:00
ebanks
8b087305f3
Added back the MQ0 annotation - however, it's not yet standard (since mq0 reads are filtered out by default in the genotyper). But it'll work when using the Annotator as a standalone.
...
While I'm at it, change getPileup to getBasePileup to remove all of the deprecation warnings.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2502 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-05 17:07:19 +00:00
hanna
a4b69d0adf
Misc bug fixes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2501 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-05 14:48:19 +00:00
depristo
c209ba55aa
More informative error message
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2499 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-05 13:55:20 +00:00
rpoplin
0a6bd5a270
CycleCovariate is now one-based so that 0 and -0 don't collide with each other. Solid recal modes now only change the inconsistent base and the previous base (along the direction of the read) instead of both the bases before and after. Removed estimatedNumberOfBins from the Covariate interface because it wasn't being used.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2498 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-04 20:52:15 +00:00
ebanks
ed2fff13aa
-Misc improvements to VCF code
...
-Small fix to callset concordance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2497 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-04 02:28:47 +00:00
hanna
29c129aced
Added very primitive read fishing walker with lots of hard coding. Fixed
...
bugs encountered when testing read fishing in Ecoli.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2496 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-04 00:54:57 +00:00
ebanks
7b702b086f
You don't need to be bi-allelic to have a non-ref alt allele frequnecy, but you do have to be a variant.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2495 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-03 22:02:39 +00:00
ebanks
b668d32cf1
Updated the min mapping quality and min base quality defaults to be 10 in both cases (and updated all integration tests) as suggested by Mark.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2494 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-03 21:31:04 +00:00
hanna
b6ecc9e151
Support for ad-hoc reference sequences. Also reenabled BWA/Java integration test, which was commented out
...
and the data backing it up deleted without my knowledge. Unfortunately, since the data was deleted, I had
to regenerate the data and a new md5. Hopefully the aligner output is still correct.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2493 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-02 20:19:14 +00:00
asivache
ad549eacfd
Now that we changed how deletions are represented, got to update MD5...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2491 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 22:00:58 +00:00
asivache
46362ce532
In extended event lines, now prints deletions in verbose format as well (e.g. "-AAT")
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2490 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:57:20 +00:00
asivache
a18e31f5b8
If alignment context at the locus holds extended event, get rod metadata and (importantly) reference bases for the whole span of the event (if it is a deletion that is, insertions still have length 0 on the ref!)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2489 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:56:25 +00:00
asivache
a41cb0701b
Now can generate verbose String representation of deletions (e.g. "-AAT") if reference bases are provided as an argument to getEventStringWithCounts().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2488 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:54:50 +00:00
asivache
89791d730e
Compute and cache the length of the longest deletion observed at the site; ReadBackedExtendedEventPileup now has a getter to access that value.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2487 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:19:39 +00:00
asivache
9c41ac252f
Disable testSingleBPFailure - getReferenceContext() now whould agree to accept length > 1 genome locs as its argument, so there's nothing to test...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2486 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:12:00 +00:00
asivache
8932e67325
Removed sanity check that required GenomeLoc argument to be strictly 1-base long. We need to relax this in order to be able to pass around a reference context containing full-length chunk of deleted reference bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2485 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 20:14:08 +00:00
hanna
497ae700c4
A rethink of the existing BAM block extraction code: rather than working in
...
chunk space directly, stream data in block space, converting to chunk space
on demand.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2484 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 18:19:51 +00:00
rpoplin
80658fd99e
AnalyzeCovariates gets the same performance improvements as the recalibrator. NHashMap class is removed completely.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2483 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 18:10:10 +00:00
rpoplin
9b2733a54a
Misc clean up in the recalibrator related to the nested hash map implementation. CountCovariates no longer creates the full flattened set of keys and iterates over them. The output csv file is in sorted order by default now but there is a new option -unsorted which can be used to save a little bit of run time.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2482 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 16:58:04 +00:00
asivache
4aeb50c87d
Added: integration test for extended pileup (with indels included)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2481 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 23:02:23 +00:00
asivache
c928347c0c
Extended event pileups are more verbose now: following a sequence of 'D','I', and '.' symbols, actual distinct events are listed along with their counts (example: +AAA:3,+AAC:1 for the total of 4 indel observations with 3 reads showing +AAA and one read showing +AAC)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2480 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 22:44:18 +00:00
asivache
8330058216
method added: getEventStringsWithCounts()
...
Returns list of Pairs <String,Integer>, where each pair consists of a unique indel event observed at the site and the total number of observations of that event. String representation for insertions is verbose (e.g. +ACT), while deletions are represented as "5D" (since read backed pileup has no reference information, so we can not get actual sequence of deleted bases)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2479 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 22:41:58 +00:00