Guillermo del Angel
e724c63f2b
Reverting last commit until I learn how to effectively replicate and debug pipeline test failures, and until I also learn how to effectively remove a kep from a HashMap that's being iterated on
2012-02-17 17:18:43 -05:00
Guillermo del Angel
f2ef8d1d23
Reverting last commit until I learn how to effectively replicate and debug pipeline test failures, and until I also learn how to effectively remove a kep from a HashMap that's being iterated on
2012-02-17 17:15:53 -05:00
Guillermo del Angel
3e031a540f
Solve merge conflict
2012-02-17 10:56:03 -05:00
Guillermo del Angel
cd352f502d
Corner case bug fix: if a read starts with an insertion, when computing the consensus allele for calling the insertion was only added to the last element in the consensus key hash map. Now, an insertion that partially overlaps with several candidate alleles will have their respective count increased for all of them
2012-02-17 10:21:37 -05:00
Guillermo del Angel
2f08846d82
Merged bug fix from Stable into Unstable
2012-02-14 21:26:25 -05:00
Guillermo del Angel
7dc6f73399
Bug fix for validation site selector: records with AC=0 in them were always being thrown out if input vcf was sites-only, even when -ignorePolymorphicStatus flag was set
2012-02-14 21:11:24 -05:00
Ryan Poplin
30085781cf
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-14 14:01:20 -05:00
Ryan Poplin
ae5b42c884
Put base insertion and base deletions in the SAMRecord as a string of quality scores instead of an array of bytes. Start of a proper genotype given alleles mode in HaplotypeCaller
2012-02-14 14:01:04 -05:00
David Roazen
8f7587048c
Update the expected novel TiTv in the HybridSelectionPipelineTest
...
The expected novel TiTv has changed for this set of variants now that
multi-allelic mode is on by default.
2012-02-13 20:25:52 -05:00
David Roazen
dfcdf92afa
Revert "Disable HaplotypeCaller integration tests in Stable"
...
These tests should remain enabled in Unstable.
This reverts commit 15c5b7aee1327f9dc012d2168f127a4700fe5064.
2012-02-13 16:37:31 -05:00
David Roazen
85d31f80a2
Merged bug fix from Stable into Unstable
2012-02-13 16:37:11 -05:00
David Roazen
d5fce22d78
Disable HaplotypeCaller integration tests in Stable
...
These tests use out-of-date files that no longer exist, and only
need to be enabled in Unstable for now.
2012-02-13 16:28:19 -05:00
David Roazen
03e5184741
Fix serious engine bug that could cause reads to be dropped under certain circumstances
...
When aggregating raw BAM file spans into shards, the IntervalSharder tries to combine
file spans when it can. Unfortunately, the method that combines two BAM file
spans was seriously flawed, and would produce a truncated union if the file spans
overlapped in certain ways. This could cause entire regions of the BAM file containing
reads within the requested intervals to be dropped.
Modified GATKBAMFileSpan.union() to correct this problem, and added unit tests
to verify that the correct union is produced regardless of how the file spans
happen to overlap.
Thanks to Khalid, who did at least as much work on this bug as I did.
2012-02-13 16:25:21 -05:00
Ryan Poplin
8742f5e36c
Updating BQSR scala script to take any number of known sites files and to use the scatter count input argument.
2012-02-13 15:44:30 -05:00
Eric Banks
ad90af94ed
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-13 15:10:10 -05:00
Eric Banks
0920a1921e
Minor fixes to splitting multi-allelic records (as regards printing indel alleles correctly); minor code refactoring; adding integration tests to cover +/- splitting multi-allelics.
2012-02-13 15:09:53 -05:00
Eric Banks
14981bed10
Cleaning up VariantsToTable: added docs for supported fields; removed one-off hidden arguments for multi-allelics; default behavior is now to include multi-allelics in one record; added option to split multi-allelics into separate records.
2012-02-13 14:32:03 -05:00
Ryan Poplin
e9338e2c20
Context covariate needs to look in the reverse direction for negative stranded reads.
2012-02-13 13:40:41 -05:00
Ryan Poplin
41ffd08d53
On the fly base quality score recalibration now happens up front in a SAMIterator on input instead of in a lazy-loading fashion if the BQSR table is provided as an engine argument. On the fly recalibration is now completely hooked up and live.
2012-02-13 12:35:09 -05:00
Eric Banks
c8c06c7753
Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-11 23:02:19 -05:00
Eric Banks
ac9250b12b
Don't assume chrom20, just pull from the file list
2012-02-11 23:02:05 -05:00
Ryan Poplin
3caa1b83bb
Updating HC integration tests
2012-02-11 11:48:32 -05:00
Ryan Poplin
9b8fd4c2ff
Updating the half of the code that makes use of the recalibration information to work with the new refactoring of the bqsr. Reverting the covariate interface change in the original bqsr because the error model enum was moved to a different class and didn't make sense any more.
2012-02-11 10:57:20 -05:00
Eric Banks
f52f1f659f
Multiallelic implementation of the TDT should be a pairwise list of values as per Mark Daly. Integration tests change because the count in the header is now A instead of 1.
2012-02-10 14:15:59 -05:00
Mauricio Carneiro
f1990981fc
A little BQSR scala script to use with scatter/gather
2012-02-10 14:00:53 -05:00
Mauricio Carneiro
a7c6f255e9
Adding the old gatherer to BQSR
...
for now, the old gatherer will still work for us to scatter/gather our tests.
2012-02-10 13:33:57 -05:00
Mauricio Carneiro
1fb19a0f98
Moving the covariates and shared functionality to public
...
so Ryan can work on the recalibration on the fly without breaking the build. Supposedly all the secret sauce is in the BQSR walker, which sits in private.
2012-02-10 11:44:01 -05:00
Mark DePristo
48cc4b913a
bugfix for incremental refresh in gsafolkLSFlogs
2012-02-10 11:30:51 -05:00
Eric Banks
8ad4046642
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-10 11:18:02 -05:00
Mark DePristo
0722df46db
gsafolkLSFLogs creates a subset of LSF MySQL db related to gsafolk only
...
Creates a SQL table in the MySQL server calcium at the Broad that contains only
key information about the LSF usage of members of the gsafolk fairshare group
Does this by first building a list of gsafolk uids, selecting lsf info from matter's
table, and inserts this information into the gsafolk_lsf queue as part of the
GATK schema. The standard way to run this is with incremental refreshes enabled,
so that the program only fetches new raw lsf records with timestamps beyond the
max timestamp present in the GATK LSF table.
The default way to run this program is via cron with 'python private/python/gsafolkLSFLogs.py'
2012-02-10 11:12:20 -05:00
Eric Banks
5e18020a5f
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-10 11:08:33 -05:00
Eric Banks
f53cd3de1b
Based on Ryan's suggestion, there's a new contract for genotyping multiple alleles. Now the requester submits alleles in any arbitrary order - rankings aren't needed. If the Exact model decides that it needs to subset the alleles because too many were requested, it does so based on PL mass (in other words, I moved this code from the SNPGenotypeLikelihoodsCalculationModel to the Exact model). Now subsetting alleles is consistent.
2012-02-10 11:07:32 -05:00
Mauricio Carneiro
5af373a3a1
BQSR with indels integrated!
...
* added support to base before deletion in the pileup
* refactored covariates to operate on mismatches, insertions and deletions at the same time
* all code is in private so original BQSR is still working as usual in public
* outputs a molten CSV with mismatches, insertions and deletions, time to play!
* barely tested, passes my very simple tests... haven't tested edge cases.
2012-02-09 18:46:45 -05:00
Eric Banks
7a937dd1eb
Several bug fixes to new genotyping strategy. Update integration tests for multi-allelic indels accordingly.
2012-02-09 16:14:22 -05:00
Eric Banks
0f728a0604
The Exact model now subsets the VC to the first N alleles when the VC contains more than the maximum number of alleles (instead of throwing it out completely as it did previously). [Perhaps the culling should be done by the UG engine? But theoretically the Exact model can be called outside of the UG and we'd still want the context subsetted.]
2012-02-09 14:02:34 -05:00
Matt Hanna
aa097a83d5
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-09 11:26:48 -05:00
Matt Hanna
b57d4250bf
Documentation request by Eric. At each stage of the GATK where filtering occurs, added documentation suggesting the goal of the filtering along with examples of suggested inputs and outputs.
2012-02-09 11:24:52 -05:00
Ryan Poplin
5b3d875833
Incorporating Mark's suggestions on the AnalyzeCovariates plots.
2012-02-09 09:05:08 -05:00
Mauricio Carneiro
d561914d4f
Revert "First implementation of GATKReportGatherer"
...
premature push from my part. Roger is still working on the new format and we need to update the other tools to operate correctly with the new GATKReport.
This reverts commit aea0de314220810c2666055dc75f04f9010436ad.
2012-02-08 23:28:55 -05:00
Ryan Poplin
270b160d87
Incorporating feedback from Mauricio on the plots
2012-02-08 16:26:32 -05:00
Ryan Poplin
4316437a62
Initial version of R script that will be called by new BQSR to replace the entire AnalyzeCovariates program
2012-02-08 16:00:12 -05:00
Eric Banks
2f800b078c
Changes to default behavior of UG: multi-allelic mode is always on; max number of alternate alleles to genotype is 3; alleles in the SNP model are ranked by their likelihood sum (Guillermo will do this for indels); SB is computed again.
2012-02-08 15:27:16 -05:00
Matt Hanna
51ac87b28c
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-08 08:43:55 -05:00
Matt Hanna
5b58fe741a
Retiring Picard customizations for async I/O and cleaning up parts of the code to use common Picard utilities I recently discovered.
...
Also embedded bug fix for issues reading sparse shards and did some cleanup based on comments during BAM reading code transition meetings.
2012-02-08 08:34:37 -05:00
Khalid Shakir
cda1e1b207
Minor manual merge update for List class to Seq interface usage.
2012-02-08 02:24:54 -05:00
Khalid Shakir
ef74363b1b
Merged bug fix from Stable into Unstable
2012-02-08 02:14:26 -05:00
Khalid Shakir
23e7f1bed9
When an interval list specifies overlapping intervals merge them before scattering.
2012-02-08 02:12:16 -05:00
Mauricio Carneiro
f30731f19b
these were not supposed to be committed. Pulling it out
2012-02-07 21:38:39 -05:00
Mauricio Carneiro
337819e791
disabling the test while we fix it
2012-02-07 19:22:32 -05:00
Roger Zurawicki
c0c676590b
First implementation of GATKReportGatherer
...
- Added the GATKReportGatherer
- Added private methods in GATKReport to combine Tables and Reports
- It is very conservative and it will only gather if the table columns, match.
- At the column level it uses the (redundant) row ids to add new rows. It will throw an exception if it is overwriting data.
Added the gatherer functions to CoverageByRG
Also added the scatterCount parameter in the Interval Coverage script
Made some more GATKReport methods public
The UnitTest included shows that the merging methods work
Added a getter for the PrimaryKeyName
Fixed bugs that prevented the gatherer form working
Working GATKReportGatherer
Has only the functional to addLines
The input file parser assumes that the first column is the primary key
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2012-02-07 18:14:47 -05:00