- summary output now reported as a percentage rather than proportion; 2 sigfigs
- fixed minor bug where FNR was calculated over total calls rather than total variant sites
- column headers are_now_contiguous_strings
- spacing fixed
- "No Call" separated from "Ref Call" as its own column
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1644 348d0f76-0448-11de-a6fe-93d51630548a
Additional output & better output formatting. It has now undergone a good five hours of testing; and for pools of size 1 outputs exactly the same statistics as GenotypeConcordance (when GenotypeConcordance is modified to do nothing on reference='N'); and for pools of many sizes outputs close to the expected (by genetics) statistics. Looks like this is working properly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1642 348d0f76-0448-11de-a6fe-93d51630548a
Deletion: PooledGenotypeConcordanceNew
Rewrite: PooledGenotypeConcordance. It works, and is blazing fast compared to the earlier version (1 order of magnitude speedup)! And is now entirely non-hackey, as opposed to before when there were some hacky bits.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1640 348d0f76-0448-11de-a6fe-93d51630548a
Iterates synchronously over two (name-ordered) single-end alignment SAM files with, possibly, multiple alignments per read and for each read name encountered returns pairs<all alignments for end1, all alignments for end2>
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1639 348d0f76-0448-11de-a6fe-93d51630548a
The Strand filter then needs to ignore those bases when determining bias.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1636 348d0f76-0448-11de-a6fe-93d51630548a
-added option to have "D"s inserted for deleted bases in pileup strings
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1635 348d0f76-0448-11de-a6fe-93d51630548a
@PooledGenotypeConcordanceNew - a new version of the pooled genotype concordance test for Variant Eval. Code altered to be more extensible, use a private class for handling the count tables so it doesn't gunk up the code in the test itself, and for easy debugging. The hackier methods from the original were rewritten properly. Currently computes more statistics that it outputs. Code compiles, is never called by anything, and breaks none of the tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1632 348d0f76-0448-11de-a6fe-93d51630548a
2. Fix ratio printouts (for params file)
3. Rename ratio filter's get counts method to avoid confusion; more changes on the way this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1616 348d0f76-0448-11de-a6fe-93d51630548a
don't print out intervals to be merged if they're not within the global -L intervals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1615 348d0f76-0448-11de-a6fe-93d51630548a
@NQSMismatchCovariantWalker - Walks along the gene calculating the table
# NQS
# Q score
# mismatches at non-dbsnp sites
# total number of bases at non-dbsnp sites
And prints it out at the end.
Changes:
@PooledGenotypeConcordance now works. Takes a path to a file listing a bunch of hapmap IDs in whatever pool we want to check, reads those in, and checks for concordance by name.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1614 348d0f76-0448-11de-a6fe-93d51630548a
but mostly altering the code so it's much more readable and understandable, and much less hacky-looking.
ADDED:
@Quad: This is just like Pair, except with four fields. In the original CoverageAndPowerWalker I often used
a pair of pairs to hold things, which made the code nigh unreadable.
@SQuad: An extension of Quad for when you want to store objects of the same type. Let's you simply declare
new SQuad<X> rather than new Quad<X,X,X,X>
@ReadOffsetQuad: An extension of Quad specifically for holding two lists of reads and two lists of offsets
Supports construction from AlignmentContexts and conversion to AlignmentContexts (given
a GenomeLoc). There are methods that make it very clear what the code is doing (getSecondRead()
rather than the cryptic getThird() )
@PowerAndCoverageWalker: The new version of CoverageAndPowerWalker. If the tests all go well, then I'll remove
the old version. New to this version is the ability to give an output file directly
to the walker, so that locus information prints to the file, while the final reduce
prints to standard out. Bootstrap iterations are now a command line argument rather
than a final int; and users can instruct the walker to print out the coverage/power
statistics for both the original reads, and those reads whose quality score exceeds
a user-defined threshold.
CHANGES:
@PoolUtils: Altered methods to accept as argumetns, and return, Quad objects. Added a random partition method
for bootstrapping.
@CoverageAndPowerWalker: Altered methods to work with the new PoolUtils methods.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1602 348d0f76-0448-11de-a6fe-93d51630548a
Rollback of Variant-related changes of r1585, additional PGC code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1586 348d0f76-0448-11de-a6fe-93d51630548a
@VariantEvalWalker - added a command line option to input a file path to a pooled call file for pooled genotype concordance checking. This string is to be passed to the PooledGenotypeConcordance object.
@AllelicVariant - added a method isPooled() to distinguish pooled AllelicVariants from unpooled ones.
@ all the rest - implemented isPooled(); for everything other than PooledEMSNProd it simply returns false, for PooledEMSNProd it returns true.
Added:
@PooledGenotypeConcordance - takes in a filepath to a pool file with the names of hapmap individuals for concordance checking with pooled calls
and does said concordance checking over all pools. Commented out as all the methods are as yet unwritten.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1585 348d0f76-0448-11de-a6fe-93d51630548a
After consulting our resident expert (Aaron), we're going to (temporarily) remove the date from the vcf output until we can come up with a better solution. However, this shouldn't cause any short-term problems because the data truly is optional.
VF test's MD5s are updated.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1580 348d0f76-0448-11de-a6fe-93d51630548a
1. VariantsToVCF can now be called statically to output VCF for a single ROD instance; this is temporary until we have a VCF ROD.
2. VariantFiltration now outputs only 2 files, both mandatory: all variants that pass filters in geli text, and all variants in VCF.
If there are any problems, go find Aaron.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1569 348d0f76-0448-11de-a6fe-93d51630548a
2) when reads with deletions are requested, adds to the pile just those: reads with 'D' over the current reference base, but not 'N'
3) next() now implements a loop: recursive forward iteration calls to next() until ref. position with non-zero coverage is encountered were OK for (short) deletions, but with long stretches of N's they end up with stack overflow
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1568 348d0f76-0448-11de-a6fe-93d51630548a
@CoverageAndPowerWalker - removed a hanging colon that was being printed after the reference position
@VariantEvalWalker - added a command line argument for pool size for eventual use in doing pooled caller evaluations. As now, the variable is unused.
@AlignmentContext - altered the scope of class variables from private to protected in order that child objects might have access to them
New Additions:
Filtered Contexts
Sometimes we want to filter or partition reads by some aspect (quality score, read direction, current base, whatever) and use only those reads as
part of the alignment context. Prior to this I've been doing the split externally and creating a new AlignmentContext object. This new approach makes
it a bit easier, as each of these objects are children of AlignmentContext, and can be instantiated from a "raw" AlignmentContext.
@FilteredAlignmentContext is an abstract class that defines the behavior. The abstract method 'filter' is called on the input AlignmentContext, filtering
those reads and offsets by whatever you can think of. The filtered reads/offsets are then maintained in the reads and offsets fields. These classes can
be passed around as AlignmentContexts themselves. Writing a new kind of read-filtered alignment context boils down to implementing the filter method.
@ReverseReadsContext - a FilteredAlignmentContext that takes only reads in the reverse direction
@ForwardReadsContext - a FilteredAlignmentContext that takes only reads in the forward direction
@QualityScoreThresholdContext - a FilteredAlignmentContext that takes only reads above a given quality score threshold (defaults to 22 if none provided).
A unit test bamfile and associated unit tests for these are in the works.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1559 348d0f76-0448-11de-a6fe-93d51630548a
-Bamboo will now run the integration tests once a day, and the normal units tests on each check-in.
-Also added a bunch of unit tests for VariantEval walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1555 348d0f76-0448-11de-a6fe-93d51630548a
-I also uncovered a corner case in the cleaner that for some reason was commented out but shouldn't have been. Hooray for unit tests!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1553 348d0f76-0448-11de-a6fe-93d51630548a
Also fixed the case where MD5 sums had leading zero's clipped off
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1551 348d0f76-0448-11de-a6fe-93d51630548a
Note to self: never give project status in an svn log.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1550 348d0f76-0448-11de-a6fe-93d51630548a
1) Moved various disparate concordance / set splitting functionalities to a new parent tool which works like VariantFiltration (i.e. people can write various modules that fit inside and can be run though it).
2) Fixed up argument parsing in VariantFiltration to use key=value format so we don't accidentally mox up values (like I had been doing).
3) Have indel rod print samples
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1540 348d0f76-0448-11de-a6fe-93d51630548a
-Remove KGenomesSNPROD
-Remove rodFLT
-Renamed rodGFF to RodGenotypeChipAsGFF
-Fixed a problem in SSGenotypeCall
-Added basic SSGenotype Test class
-Make VCFHeader constructors public
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1536 348d0f76-0448-11de-a6fe-93d51630548a
I'll move this all to core in a bit...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1532 348d0f76-0448-11de-a6fe-93d51630548a
This code eventually needs to end up in the VariantFiltration system - when we are ready to parameterize on the fly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1528 348d0f76-0448-11de-a6fe-93d51630548a
| Added items |
---------------
@/varianteval/PoolAnalysis
Interface to identify variant analyses that are pool-specific.
@/varianteval/BasicPoolVariantAnalysis
Nearly the same as BasicVariantAnalysis with the addition of a protected integer (numIndividualsInPool)
which holds the pool size. One soulcrushing change is that "protected String filename" needed to
become "protected String[] filename" since now multiple truth files may be looked at. It was tempting
to make the change in BasicVariantAnalysis with some default methods that would maintain usability of
the remainder of the VariantAnalysis objects, but I decided to hold off. We can always merge these
together later.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1526 348d0f76-0448-11de-a6fe-93d51630548a
Doesn't work yet; same command I used to use now causes GATK to throw an exception.
Will check with Matt & Aaron tomorrow, then do a regression test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1509 348d0f76-0448-11de-a6fe-93d51630548a
Also, it's safer to let colt do the log factorial calculations for us.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1503 348d0f76-0448-11de-a6fe-93d51630548a
The user is warned if a locus exceeds this threshold, and no more reads are added.
Also CombineDup walker had an incorrect package name.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1496 348d0f76-0448-11de-a6fe-93d51630548a
Will be made into a utility routine when BWA comes together.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1494 348d0f76-0448-11de-a6fe-93d51630548a
This now allows us to incorporate both the clustered SNP filter and a SNP-near-indels filter, which otherwise wasn't possible.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1484 348d0f76-0448-11de-a6fe-93d51630548a