Ryan Poplin
25532bdc37
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-26 11:43:32 -05:00
Ryan Poplin
390d493049
Updating ActiveRegionWalker interface to output a probability of active status instead of a boolean. Integrator runs a band-pass filter over this probability to produce actual active regions. First version of HaplotypeCaller which decides for itself where to trigger and assembles those regions.
2012-01-26 11:37:08 -05:00
Eric Banks
859dd882c9
Don't make it standard for now
2012-01-26 00:38:16 -05:00
Eric Banks
c5e81be978
Adding pairwise AF table. Not polished at all, but usable none-the-less.
2012-01-26 00:37:06 -05:00
Eric Banks
702a2d768f
Initial version of multi-allelic summary module in VariantEval
2012-01-25 19:42:55 -05:00
Eric Banks
9a60887567
Lost an import in the merge
2012-01-25 19:41:41 -05:00
Eric Banks
cba5f1a8b1
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-25 19:19:03 -05:00
Eric Banks
ddaf51a50f
Updated one integration test for indels
2012-01-25 19:18:51 -05:00
Eric Banks
add6918f32
Cleaner, more efficient way of determining the last dependent set in the queue.
2012-01-25 16:21:10 -05:00
Menachem Fromer
db645a94ca
Added options to make the batch-merger more all-inclusive: keep all indels, SNPs (even filtered ones) but maintain their annotations. Also, VariantContextUtils.simpleMerge can now merge variants of all types using the Hidden non-default enum MultipleAllelesMergeType=MIX_TYPES
2012-01-25 16:10:59 -05:00
Eric Banks
ef335a5812
Better implementation of the fix; PL index is now traversed in order.
2012-01-25 15:15:42 -05:00
Eric Banks
8e2d372ab0
Use remove instead of setting the value to null
2012-01-25 14:41:34 -05:00
Eric Banks
05816955aa
It was possible that we'd clean up a matrix column too early when a dependent column aborted early (with not enough probability mass) because we weren't being smart about the order in which we created dependencies. Fixed.
2012-01-25 14:28:21 -05:00
Eric Banks
2799a1b686
Catch exception for bad type and throw as a TribbleException
2012-01-25 12:15:51 -05:00
Eric Banks
96b62daff3
Minor tweak to the warning message.
2012-01-25 11:55:33 -05:00
Eric Banks
fb863dc6a7
Warn user when trying to run with EMIT_ALL_SITES with indels; better docs for that option.
2012-01-25 11:50:12 -05:00
Eric Banks
e349b4b14b
Allow appending with the dbSNP ID even if a (different) ID is already present for the variant rod.
2012-01-25 11:35:54 -05:00
Eric Banks
ea3d4d60f2
This annotation requires rods and should be annotated as such
2012-01-25 11:35:13 -05:00
Ryan Poplin
bbefe4a272
Added option to be able to write out the active regions to an interval list file
2012-01-25 09:47:06 -05:00
Ryan Poplin
9818c69df6
Can now specify active regions to process at the command line, mainly for debugging purposes
2012-01-25 09:32:52 -05:00
Mauricio Carneiro
97499529c7
another small bug with the file extension.
2012-01-24 16:14:35 -05:00
Mauricio Carneiro
ffd61f4c1c
Refactor the Pileup Element with regards to indels
...
Eric reported this bug due to the reduced reads failing with an index out of bounds on what we thought was a deletion, but turned out to be a read starting with insertion.
* Refactored PileupElement to distinguish clearly between deletions and read starting with insertion
* Modified ExtendedEventPileup to correctly distinguish elements with deletion when creating new pileups
* Refactored most of the lazyLoadNextAlignment() function of the LocusIteratorByState for clarity and to create clear separation between what is a pileup with a deletion and what's not one. Got rid of many useless if statements.
* Changed the way LocusIteratorByState creates extended event pileups to differentiate between insertions in the beginning of the read and deletions.
* Every deletion now has an offset (start of the event)
* Fixed bug when LocusITeratorByState found a read starting with insertion that happened to be a reduced read.
* Separated the definitions of deletion/insertion (in the beginning of the read) in all UG annotations (and the annotator engine).
* Pileup depth of coverage for a deleted base will now return the average coverage around the deletion.
* Indel ReadPositionRankSum test now uses the deletion true offset from the read, changed all appropriate md5's
* The extra pileup elements now properly read by the Indel mode of the UG made any subsequent call have a different random number and therefore all RankSum tests have slightly different values (in the 10^-3 range). Updated all appropriate md5s after extremely careful inspection -- Thanks Ryan!
phew!
2012-01-24 16:07:21 -05:00
Matt Hanna
c312bd5960
Weirdly, PicardException inherits from SAMException, which means that our specialty code for
...
reporting malformed BAMs was actually misreporting any error that happened in the Picard layer
as a BAM ERROR.
Specifically changing PicardException to report as a ReviewedStingException; we might want to
change it in the future. I'll followup with the Picard team to make sure they really, really
want PicardException to inherit from SAMException.
2012-01-24 15:30:04 -05:00
Mauricio Carneiro
7c7ca0d799
fixing bug with fastq extension
...
* PPP only recognized .fasta and .fq, failing when the user provided a .fastq file. Fixed.
2012-01-24 11:02:15 -05:00
Mark DePristo
0a3172a9f1
Fix for ref 0 bases for Chris
...
-- Disturbingly, fixing this bug doesn't actually cause an test failures.
-- Wrote a new QCRefWalker to actually check in detail that the reference bases coming into the RefWalker are all correct when comparing against a clean uncached load of the contig bases directly.
-- However, I cannot run this tool due to some kind of weird BAM error -- sending this on to Matt
2012-01-24 10:55:09 -05:00
Mauricio Carneiro
945cf03889
IntelliJ ate my import!
2012-01-23 21:46:45 -05:00
Mauricio Carneiro
2bb9525e7f
Don't set base qualities if fastQ is provided
...
* Pacbio Processing pipeline now works with the new fastQ files outputted by the Pacbio instrument
2012-01-23 17:57:29 -05:00
Khalid Shakir
c18beadbdb
Device files like /dev/null are now tracked as special by Queue and are not used to generate .out file paths, scattered into a temporary directory, gathered, deleted, etc.
...
Attempted workaround for xdr_resourceInfoReq unsatisfied link during loading of libbat.so.
2012-01-23 16:17:04 -05:00
Mark DePristo
02450e4b12
Merged bug fix from Stable into Unstable
2012-01-23 12:08:39 -05:00
Christopher Hartl
798596257b
Enable the Genotype Phasing Evaluator. Because it didn't have the same argument structure as the base class, update2 of VariantEvaluator was being called, rather than update2 of the actual module.
2012-01-23 10:50:16 -05:00
Mark DePristo
80a4ce0edf
Bugfix for incorrect error messages for missing BAMs and VCFs
...
-- Missing BAMs were appearing as StingExceptions
-- Missing VCFs were showing up as CommandLineErrors, but it's clearer for them to be CouldNotReadInputFile exceptions
-- Added integration tests to ensure missing BAMs, VCFs, and -L files are properly thrown as CouldNotReadInputFile exceptions
-- Added path to standard b37 BAM to BaseTest
-- Cleaned up code in SAMDataSource, removing my parallel loading code as this just didn't prove to be useful.
2012-01-23 09:52:07 -05:00
Guillermo del Angel
31d2f04368
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-23 09:23:03 -05:00
Guillermo del Angel
966387ca0b
Next intermediate commit in the pool caller. Lots of bug fixes and now we can emit true vcf's with calls in discovery mode (still of unknown quality) - old validation mode is temporarily broken,will be fixed in next refactoring.
2012-01-23 09:22:31 -05:00
Christopher Hartl
4a08e8ca6e
Minor tweaks to T2D-related qscripts. Replacing old md5s from the BeagleIntegrationTest. All differences boiled down either to the accounting of genotypes changed (./. --> 0/0 is no longer a "changed" genotype, and original genotypes that were ./. are represented as OG=. rather than OG=./. .)
...
This is somewhat of an arbitrary decision, and is negotiable. I could see treating
GT:PL ./.:.
differently from
GT:PL .:0,3,6
but am not sure the worth of doing so.
2012-01-23 08:25:34 -05:00
Ryan Poplin
4d6312d4ea
HaplotypeCaller is now an ActiveRegionWalker.
2012-01-22 14:31:01 -05:00
Christopher Hartl
3b1aad4f17
After a minor and abject freakout, alter the T2D script to seek out truth sensitivities between 80 and 100, rather than between 0.8 and 1. Also, don't consider a genotype "changed by beagle" if the initial genotype is a no-call.
2012-01-20 23:43:51 -05:00
Christopher Hartl
9b4f6afa21
Alterations to scripts for better performance. Grid search now expands the sens/spec tradeoff (90 was far too aggressive against hapmap chr20), and 20 max gaussians was too many, and caused errors. For consensus genotypes: remember to gunzip the beagle outputs before converting to VCF. Also, beagle can in fact create 'null' alleles in certain circumstances. I'm not sure what exactly those circumstances are, but those sites should be ignored. When it does, all alleles apear to be set to null, so this should not affect the actual phasing in the output VCF.
2012-01-20 23:07:59 -05:00
Ryan Poplin
4b18786b5d
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-19 22:05:20 -05:00
Ryan Poplin
ace9333068
Active region walkers can now see the reads in a buffer around thier active reigons. This buffer size is specified as a walker annotation. Intervals are internally extended by this buffer size so that the extra reads make their way through the traversal engine but the walker author only needs to see the original interval. Also, several corner case bug fixes in active region traversal.
2012-01-19 22:05:08 -05:00
Menachem Fromer
066da80a3d
Added KEEP_UNCONDTIONAL option which permits even sites with only filtered records to be included as unfiltered sites in the output
2012-01-19 18:19:58 -05:00
Christopher Hartl
7f3ad25b01
Adding a mode to VariantFiltration to invalidate previously-applied filters to allow complete re-filtering of a VCF.
...
T2D VQSR: re-calling now done with appropriate quality settings and using BAQ.
2012-01-19 10:54:48 -05:00
Ryan Poplin
7e082c7750
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-19 09:11:23 -05:00
Christopher Hartl
39e6df5aa9
Fix edge case for very small VCFs
2012-01-19 00:51:28 -05:00
Christopher Hartl
1e037a0ecf
Ensure second-to-last line printed
2012-01-19 00:33:08 -05:00
Christopher Hartl
9946853039
Remove duplicated line
2012-01-19 00:25:22 -05:00
Christopher Hartl
cf9b1d350a
Some minor changes to in-process functions that nobody else uses. CGL now properly ignores no-calls for external VCFs.
2012-01-19 00:20:49 -05:00
Eric Banks
ab8f499bc3
Annotate with FS even for filtered sites
2012-01-18 22:04:51 -05:00
Guillermo del Angel
b123416c4c
Resolve stale merge changes
2012-01-18 20:56:36 -05:00
Guillermo del Angel
2eb45340e1
Initial, raw, mostly untested version of new pool caller that also does allele discovery. Still needs debugging/refining. Main modification is that there is a new operation mode, set by argument -ALLELE_DISCOVERY_MODE, which if true will determine optimal alt allele at each computable site and will compute AC distribution on it. Current implementation is not working yet if there's more than one pool and it will only output biallelic sites, no functionality for true multi-allelics yet
2012-01-18 20:54:10 -05:00
Ryan Poplin
0133d1a901
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-18 09:53:42 -05:00