Eric Banks
6790e103e0
Moving lots of walkers back from protected to public (along with several of the VA annotations).
...
Let's see whether Mauricio's automatic git hook really works!
2013-01-24 11:42:49 -05:00
Chris Hartl
a3b98daf1a
Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable
2013-01-23 14:49:34 -05:00
Chris Hartl
7fcfa4668c
Since GenotypeConcordance is now a standalone walker, remove the old GenotypeConcordance evaluation module and the associated integration tests.
2013-01-23 14:47:23 -05:00
Mark DePristo
8026199e4c
Updating md5s for CountReadsInActiveRegions and HaplotypeCaller to reflect new activity profile mechanics
...
-- In this process I discovered a few missed sites in the old code. The new approach actually produces better HC results than the previous version.
2013-01-23 13:46:01 -05:00
Mark DePristo
8d9b0f1bd5
Restructure ActivityProfiler into root class ActivityProfile and derived class BandPassActivityProfile
...
-- Required before I jump in an redo the entire activity profile so it's can be run imcrementally
-- This restructuring makes the differences between the two functionalities clearer, as almost all of the functionality is in the base class. The only functionality provided by the BandPassActivityProfile is isolated to a finalizeProfile function overloaded from the base class.
-- Renamed ActivityProfileResult to ActivityProfileState, as this is a clearer indication of its actual functionality. Almost all of the misc. walker changes are due to this name update
-- Code cleanup and docs for TraverseActiveRegions
-- Expanded unit tests for ActivityProfile and ActivityProfileState
2013-01-23 13:45:21 -05:00
Chris Hartl
c500e1d8ac
Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable
2013-01-22 15:31:30 -05:00
Chris Hartl
d33c755aea
Adding docs.
2013-01-22 15:29:33 -05:00
Chris Hartl
7060e01a8e
Fix for broken unit test plus some minor changes to comments. Unit tests were broken by my pulling the site status utility function into the enum. Thankfully the unit tests caught my silly duplication of a line.
2013-01-22 15:14:41 -05:00
Mauricio Carneiro
7b8b064165
Last manual license update (hopefully)
...
if everyone updates their git hook accordingly, this will be the last time I have to manually run the script.
GSATDG-5
2013-01-18 16:13:07 -05:00
Ami Levy-Moonshine
0fb7b73107
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2013-01-18 15:03:42 -05:00
Ami Levy-Moonshine
826c29827b
change the default VCFs gatherer of the GATK (not just the UG)
2013-01-18 15:03:12 -05:00
Eric Banks
cac439bc5e
Optimized the Allele Biased Downsampling: now it doesn't re-sort the pileup but just removes reads from the original one.
...
Added a small fix that slightly changed md5s.
2013-01-18 11:17:31 -05:00
Chris Hartl
08d2da9057
Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable
2013-01-18 10:28:45 -05:00
Chris Hartl
bf5748a538
Forgot to actually put in the md5. Also with the new change to record pairing and filtering, the multiple-records integration test changed: the indel records (T/TG | T/TGACA) are matched up (rather than left separate) resulting in properly identifying mismatching alleles, rather than HET-UNAVAILABLE and UNAVAILABLE-HET. Very nice.
2013-01-18 10:25:36 -05:00
Chris Hartl
91030e9afa
Bugfix: records that get paired up during the resolution of multiple-records-per-site were not going into genotype-level filtering. Caught via testing.
...
Testing for moltenized output, and for genotype-level filtering. This tool is now fully functional. There are three todo items:
1) Docs
2) An additional output table that gives concordance proportions normalized by records in both eval and comp (not just total in eval or total in comp)
3) Code cleanup for table creation (putting a table together the way I do takes -way- too many lines of code)
2013-01-18 09:49:48 -05:00
Eric Banks
39c73a6cf5
1. Ryan and I noticed that the FisherStrand annotation was completely busted for indels with reduced reads; fixed.
...
2. While making the previous fix and unifying FS for SNPs and indels, I noticed that FS was slightly broken in the general case for indels too; fixed.
3. I also fixed a minor bug in the Allele Biased Downsampling code for reduced reads.
2013-01-18 03:35:48 -05:00
Eric Banks
6a903f2c23
I finally gave up on trying to get the Haplotype/Allele merging to work in the HaplotypeCaller.
...
I've resigned myself instead to create a mapping from Allele to Haplotype. It's cheap so not a big deal, but really shouldn't be necessary.
Ryan and I are talking about refactoring for GATK2.5.
2013-01-18 01:21:08 -05:00
Eric Banks
6db3e473af
Better error message for bad qual
2013-01-17 10:30:04 -05:00
Eric Banks
953592421b
I think we got out of sync with the HC tests as we were clobbering each other's changes. Only differences here are to some RankSumTest values.
2013-01-17 09:19:21 -05:00
Eric Banks
ded659232b
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2013-01-16 22:49:56 -05:00
Eric Banks
a623cca89a
Bug fix for HaplotypeCaller, as reported on the forum: when reduced reads didn't completely overlap a deletion call,
...
we were incorrectly trying to find the reference position of a base on the read that didn't exist.
Added integration test to cover this case.
2013-01-16 22:47:58 -05:00
Eric Banks
dbb69a1e10
Need to use ints for quals in HaplotypeScore instead of bytes because of overflow (they are summed when haplotypes are combined)
2013-01-16 22:33:16 -05:00
Chris Hartl
e15d4ad278
Addition of moltenize argument for moltenized tabular output. NRD/NRS not moltenized because there are only two columns.
2013-01-16 18:00:23 -05:00
Mark DePristo
3c476a92a2
Add dummy functionality (currently throws an error) to allow HC to include unmapped reads during assembly and calling
2013-01-16 16:25:36 -05:00
Eric Banks
4cf34ee9da
Bug fix to FisherStrand: do not let it output INFINITY. This all needs to be unit tested, but that's coming on the horizon.
2013-01-16 15:35:04 -05:00
Mark DePristo
2a42b47e4a
Massive expansion of ActiveRegionTraversal unit tests, resulting in several bugfixes to ART
...
-- UnitTests now include combinational tiling of reads within and spanning shard boundaries
-- ART now properly handles shard transitions, and does so efficiently without requiring hash sets or other collections of reads
-- Updating HC and CountReadsInActiveRegions integration tests
2013-01-16 15:30:00 -05:00
Eric Banks
e47a389b26
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2013-01-16 14:59:11 -05:00
Eric Banks
d18dbcbac1
Added tests for changing IUPAC bases to Ns, for failing on bad ref bases, and for the HaplotypeCaller not failing when running over a region with an IUPAC base.
...
Out of curiosity, why does Picard's IndexedFastaSequenceFile allow one to query for start position 0? When doing so, that base is a line feed (-1 offset to the first base in the contig) which is an illegal base (and which caused me no end of trouble)...
2013-01-16 14:55:33 -05:00
Khalid Shakir
4ffb43079f
Re-committing the following changes from Dec 18:
...
Refactored interval specific arguments out of GATKArgumentCollection into InvtervalArgumentCollection such that it can be used in other CommandLinePrograms.
Updated SelectHeaders to print out full interval arguments.
Added RemoteFile.createUrl(Date expiration) to enable creation of presigned URLs for download over http: or file:.
2013-01-16 12:43:15 -05:00
Eric Banks
445735a4a5
There was no reason to be sharing the Haplotype infrastructure between the HaplotypeCaller and the HaplotypeScore annotation since they were really looking for different things.
...
Separated them out, adding efficiencies for the HaplotypeScore version.
2013-01-16 11:10:13 -05:00
Eric Banks
392b5cbcdf
The CachingIndexedFastaSequenceFile now automatically converts IUPAC bases to Ns and errors out on other non-standard bases.
...
This way walkers won't see anything except the standard bases plus Ns in the reference.
Added option to turn off this feature (to maintain backwards compatibility).
As part of this commit I cleaned up the BaseUtils code by adding a Base enum and removing all of the static indexes for
each of the bases. This uncovered a bug in the way the DepthOfCoverage walker counts deletions (it was counting Ns instead!) that isn't covered by tests. Fortunately that walker is being deprecated soon...
2013-01-16 10:22:43 -05:00
Eric Banks
4fb3e48099
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2013-01-16 00:13:38 -05:00
Eric Banks
0d282a7750
Bam writing from HaplotypeCaller seems to be working on all my test cases. Note that it's a hidden debugging option for now.
...
Please let me know if you notice any bad behavior with it.
2013-01-16 00:12:02 -05:00
Chris Hartl
327169b283
Refactor the method that identifies the site overlap type into the type enum class (so it can be used elsewhere potentially).
...
Completed todo item: for sites like
(eval)
20 12345 A C
20 12345 A AC
(comp)
20 12345 A C
20 12345 A ACCC
the records will be matched by the presence of a non-empty intersection of alleles. Any leftover records are then paired with an empty variant context (as though the call was unique). This has one somewhat counterintuitive feature, which is that normally
(eval)
20 12345 A AC
(comp)
20 12345 A ACCC
would be classified as 'ALLELES_DO_NOT_MATCH' (and not counted in genotype tables), in the presence of the SNP, they're counted as EVAL_ONLY and TRUTH_ONLY respectively.
+ integration test
2013-01-15 12:13:45 -05:00
Eric Banks
d3baa4b8ca
Have Haplotype extend the Allele class.
...
This way, we don't need to create a new Allele for every read/Haplotype pair to be placed in the PerReadAlleleLikelihoodMap (very inefficient). Also, now we can easily get the Haplotype associated with the best allele for a given read.
2013-01-15 11:36:20 -05:00
Mark DePristo
3c37ea014b
Retire original TraverseActiveRegion, leaving only the new optimized version
...
-- Required some updates to MD5s, which was unexpected, and will be sorted out later with more detailed unit tests
2013-01-15 10:24:45 -05:00
Eric Banks
94800771e3
1. Initial implementation of bam writing for the HaplotypeCaller with -bam argument; currently only assembled haplotypes are emitted.
...
2. Framework is set up in the VariantAnnotator for the HaplotypeCaller to be able to call in to annotate dbSNP plus comp RODs. Until the HC uses meta data though, this won't work.
2013-01-15 10:19:18 -05:00
Chris Hartl
682c59ff04
Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable
2013-01-14 13:27:34 -05:00
Chris Hartl
61bc334df1
Ensure output table formatting does not contain NaNs. For (0 eval ref calls)/(0 comp ref calls), set the proportion to 0.00.
...
Added integration tests (checked against manual tabulation)
2013-01-14 09:21:30 -05:00
Ryan Poplin
a7fe334a3f
calculating the md5s for the new tests.
2013-01-11 15:43:52 -05:00
Ryan Poplin
65afec2a53
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2013-01-11 15:22:52 -05:00
Mark DePristo
85b529cced
Updating MD5s in HC and UG that changed due to new LIBS
...
-- Resolved what was clearly a bug in UG (GGA mode was returning a neighboring, equivalent indel site that wasn't in input list. Not ideal)
-- Trivial read count differences in HC
2013-01-11 15:17:19 -05:00
Mark DePristo
8b83f4d6c7
Near final cleanup of PileupElement
...
-- All functions documented and unit tested
-- New constructor interface
-- Cleanup some uses of old / removed functionality
2013-01-11 15:17:17 -05:00
Mark DePristo
fb9eb3d4ee
PileupElement and LIBS cleanup
...
-- function to create pileup elements in AlignmentStateMachine and LIBS
-- Cleanup pileup element constructors, directing users to LIBS.createPileupFromRead() that really does the right thing
2013-01-11 15:17:17 -05:00
Mark DePristo
cc1d259cac
Implement get Length and Bases of OfImmediatelyFollowingIndel in PileupElement
...
-- Added unit tests for this behavior. Updated users of this code
2013-01-11 15:17:17 -05:00
Mark DePristo
2c38310868
Create LIBS using new AlignmentStateMachine infrastructure
...
-- Optimizations to AlignmentStateMachine
-- Properly count deletions. Added unit test for counting routines
-- AlignmentStateMachine.java is no longer recursive
-- Traversals now use new LIBS, not the old one
2013-01-11 15:17:17 -05:00
Mark DePristo
b53286cc3c
HaplotypeCaller mode to skip assembly and genotyping for performance testing
...
-- Added HCPerformance evaluation Qscript
-- Added some docs about one of the HC integration tests
-- HaplotypeCaller / ART performance evaluation script
2013-01-11 15:17:16 -05:00
Ryan Poplin
e952296c10
Adding HC GGA integration test to cover duplicated input alleles.
2013-01-11 15:01:27 -05:00
Ryan Poplin
7f7f40f851
Adding additional HC GGA integration tests to cover more complicated input alleles.
2013-01-11 14:36:21 -05:00
Eric Banks
85baf71b39
Merged bug fix from Stable into Unstable
2013-01-11 11:05:27 -05:00
Eric Banks
d78539774f
Another RR bug: off by one error led to ArrayIndexOutOfBoundsException when working with multiple samples and the variant region ended 1 base after the end of the last read for a given sample.
2013-01-11 11:05:09 -05:00
Eric Banks
79b93f659c
Merged bug fix from Stable into Unstable
2013-01-11 09:20:13 -05:00
Eric Banks
67fafbb625
Forgot an include
2013-01-11 09:19:46 -05:00
Eric Banks
6bf0cc32f9
When reducing multiple samples it is possible to try to close a region that for a given sample has no reads. Currently we'd NPE. Fixed.
2013-01-11 09:16:19 -05:00
Eric Banks
e7906713d9
Moving some random walkers back to public as requested by Mark. Mauricio will the licenses get updated automatically?
2013-01-11 02:03:43 -05:00
Eric Banks
3a51823c2a
Clean up imports
2013-01-10 23:35:01 -05:00
Eric Banks
e4b7b1955c
Forgot to add the note about length normalization to the QD docs
2013-01-10 23:34:06 -05:00
Eric Banks
ff5ac986d8
Fix docs for QD
2013-01-10 23:31:46 -05:00
Mauricio Carneiro
2a4ccfe6fd
Updated all JAVA file licenses accordingly
...
GSATDG-5
2013-01-10 17:06:41 -05:00
Mauricio Carneiro
dd177b1714
Removing fully commented out varianteval evaluators
...
- Files were completely commmented out, and were screwing up my license script. Dont like them. Removed them.
GSATDG-5
2013-01-10 17:06:12 -05:00
Chris Hartl
80dec72c53
Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable
2013-01-10 14:35:59 -05:00
Chris Hartl
31a5f88c4f
Expanded unit tests to cover the Concordance Metrics class fairly uniformly.
2013-01-10 14:33:47 -05:00
Ryan Poplin
1a18947abf
Adding new command line argument requested on the forum to control the maximum number of haplotypes that are sent forward for genotyping. In the presence of a large degree of heterozygosity the current algorithm breaks down and so this argument would need to be increased.
2013-01-09 15:54:02 -05:00
Ryan Poplin
487fb2afb4
Bug fix for the case of overlapping assembled and partially-assembled events created by the HC. Unfortunately the symbolic allele can't be combined with the indel allele because the reference basis will change.
2013-01-09 15:30:46 -05:00
Chris Hartl
6787f86803
Eliminate the import of DiploidGenotype, which switched public/private underneath me but for some reason didn't stop me from compiling...
2013-01-09 13:23:24 -05:00
Chris Hartl
c1de92b511
Add in some todo items
2013-01-09 13:16:06 -05:00
Chris Hartl
8d126161e2
Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable
2013-01-09 13:15:04 -05:00
Eric Banks
3a0dd4b175
Oops, I broke the build. NOW we shouldn't have any more public->protected dependancies.
2013-01-09 11:12:28 -05:00
Eric Banks
a921b06e02
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2013-01-09 11:06:17 -05:00
Eric Banks
4fa439d89e
Move some classes back to public because they are used in the engine. Move some test classes to protected. We should have no more public->protected dependancies now
2013-01-09 11:06:10 -05:00
Ryan Poplin
396bce1f28
Reverting this change until we can figure out the right thing to do here.
2013-01-09 10:51:30 -05:00
Eric Banks
676e79542a
Bring CombineVariants back to public since it's used for SG. I needed to break ChromosomeCountConstants out of ChromosomeCounts to make this work.
2013-01-09 10:39:48 -05:00
Ryan Poplin
c87ad8c0ef
Bug fixes related to HC's GGA mode. Tracking just the artificial allele isn't sufficient when there are multiple GGA records that change the reference basis. Also, duplicated records screw up the tracking of merged alleles.
2013-01-09 10:00:46 -05:00
Chris Hartl
ad7c2a08d4
Normalize by the event type counts, not the total genotype counts: more useful normalization.
2013-01-09 09:12:41 -05:00
Chris Hartl
b56754606b
Initial break-out of GenotypeConcordance as a standalone walker. Some basic functionality testing. Currently performs only a pairwise comparison, but is very careful about proper tabulation through the GenotypeType enum.
2013-01-09 00:34:07 -05:00
Eric Banks
264cc9e78d
Resolve protected->public dependencies for BQSR by wrapping the BQSR-specific arguments in a new class.
...
Instead of the GATK Engine creating a new BaseRecalibrator (not clean), it just keeps track of the arguments (clean).
There are still some dependency issues, but it looks like they are related to Ami's code. Need to look into it further.
2013-01-08 16:23:29 -05:00
Eric Banks
ee7d85c6e6
Move around the DiploidGenotype classes (so it can be used by the GATKPaperGenotyper)
2013-01-08 15:53:11 -05:00
Eric Banks
0e2e672521
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2013-01-08 15:46:39 -05:00
Eric Banks
f0bd1b5ae5
Okay, all public->protected dependencies are gone except for the BQSR arguments. I'll need to think through this but should be able to make that work too.
2013-01-08 15:46:32 -05:00
Tad Jordan
9cbb2b868f
ErrorRatePerCycleIntegrationTest fix
...
-- sorting by row is required
2013-01-08 14:53:07 -05:00
Eric Banks
b099e2b4ae
Moving integration tests to protected
2013-01-08 09:34:08 -05:00
Eric Banks
dfe4cf1301
When merging the PerReadAlleleLikelihoodMap classes, I forgot to initialize the underlying objects. This was causing the LargeScaleTests to fail.
2013-01-08 09:24:12 -05:00
Eric Banks
9e6c2afb28
Not sure why IntelliJ didn't add this for commit like the other dirs
2013-01-07 18:11:07 -05:00
Ami Levy-Moonshine
3787ee6de7
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2013-01-07 17:07:29 -05:00
Eric Banks
47d030a52d
Oops, move the covariates over too
2013-01-07 15:47:25 -05:00
Eric Banks
35699a8376
Move bqsr utils to protected
2013-01-07 15:41:21 -05:00
Eric Banks
a0219acfaa
Collapse the PerReadAlleleLikelihoodMap classes into 1 now that Lite is gone
2013-01-07 14:55:21 -05:00
Eric Banks
35d9bd377c
Moved (nearly) all Walkers from public to protected and removed GATKLite utils
2013-01-07 14:42:40 -05:00
Ryan Poplin
4f95f850b3
Bug fix in the HC's allele mapping for multi-allelic events. Using the allele alone as a key isn't sufficient because alleles change when the reference allele changes during VariantContextUtils.simpleMerge for multi-allelic events.
2013-01-07 11:05:44 -05:00
Ami Levy-Moonshine
d3c2c97fb2
Merge branch 'master' of github.com:broadinstitute/gsa-unstable
2013-01-06 23:35:47 -05:00
Ami Levy-Moonshine
81eef3aa37
merge development branchs of log-less HMM and FastGatherer to master
2013-01-06 23:01:58 -05:00
Eric Banks
52067f0549
Handle merge conflicts
2013-01-06 12:29:12 -05:00
Chris Hartl
41bc416b65
Remove AAL and update MD5s.
2013-01-04 16:46:14 -05:00
Eric Banks
bce6fce58d
Resolving merge conflicts after Mark's latest push
2013-01-04 14:46:39 -05:00
Eric Banks
dd7f5e2be7
Hooking up the Bayesian estimate code for calculating Qemp in BQSR; various fixes after adding unit tests.
2013-01-04 14:43:11 -05:00
Mark DePristo
bbdf9ee91b
BQSR cleanup: merge Advanced and Standard recalibration engine into just the RecalibrationEngine
...
-- As we are no longer maintaining a public/protected system we need only have one RecalibrationEngine.
-- Misc. code cleanup and docs along the way
2013-01-04 11:39:24 -05:00
Mark DePristo
7df47418d8
BQSR optimization: make RecalibrationTables thread-local, and merge results in onTraversalDone
...
-- With the newer, faster BQSR, scaling was limited by the NestedIntegerArray. The solution to this is to make the entire table thread-local, so that each nct thread has its own data and doesn't have any collisions.
-- Removed the previous partial solution of having a thread-local quality score table
-- Added a new argument -lowMemory
2013-01-04 11:39:24 -05:00
Chris Hartl
3753209584
One md5sum slipped past in the HC integration test.
2013-01-02 15:09:28 -05:00
Chris Hartl
e1d09ab0db
QD is now divided by the average length of the alternate allele (weighted by the allele count). The average length is stored in a related annotation, "AAL", which can be used to re-compute the "old" QD by simple multiplication. Integration tests *should* all pass.
2013-01-02 14:41:29 -05:00
Eric Banks
275575462f
Protect against non-standard ref bases. Ryan, please review.
2012-12-26 15:46:21 -05:00