Mark DePristo
034a997d07
Generalized Reads -> Fragment calculation
...
-- Supports ReadBackedPileup -> FragmentCollection as before
-- Added support for List<SAMRecord> -> FragmentCollection for Ryan's haplotype caller
-- General cleanup, renaming, move to separate package, more extensive unit tests, etc.
-- Added toFragment() function to ReadBackedPileup interface
2011-10-26 15:54:38 -04:00
Eric Banks
2f21b6ecfb
Removed debugging output
2011-10-26 15:50:20 -04:00
Eric Banks
b39fcb1bea
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-26 15:44:25 -04:00
Eric Banks
b6ce6ed3f8
Go around the ROD system for now so that we can just call decodeLoc() for efficiency. Noted that we should go through the ROD system once it gets cleaned up. This means that currently gzipped files are not supported with -L.
2011-10-26 15:42:53 -04:00
Eric Banks
3273c20c98
Added integration tests for Tribble-based intervals and fixed up some of the other tests based on some method changes.
2011-10-26 15:29:18 -04:00
Eric Banks
9424e8b2ca
Initial working version of new interval system in which the argument for -L (and -XL) is allowed to be a rod file (e.g. VCF). Old samtools-style intervals still behave as before. BTI is no longer supported. The merging (union or intersection) of intervals is now consistently applied to all -L (or -XL) intervals, which is nice. More testing needed.
2011-10-26 14:11:49 -04:00
Mark DePristo
7fa943aef1
Renamed FragmentPileup to FragmentUtils
2011-10-26 14:01:45 -04:00
Laurent Francioli
1f044faedd
- Genotype assignment in case of equally likeli combination is now random
...
- Genotype combinations with 0 confidence are now left unphased
2011-10-26 19:57:09 +02:00
Laurent Francioli
81b163ff4d
Indentation
2011-10-26 14:49:12 +02:00
Laurent Francioli
62cff266d4
GQ calculation corrected for most likely genotype
2011-10-26 14:40:04 +02:00
Mark DePristo
af3613cc5f
GATKSAMRecord commit branch summary
...
First, I'm sure there's a better way to do this, but I wanted to create a single commit summarizing the changes from my branch SamRecordFactory. What's the best way to do this? Rebase?
Now, on to the changes here:
-- Picard added a SamRecordFactory that is used to create instances the subclass SamRecord or BAMRecord. This factory allows us to have low-level picard readers (SamFileReader) create objects of type GATKSamRecord. The abomination of the extends and contains GATKSamRecord is now gone. GATKSamRecords are now produced by this factory, the GATK provides this factory to our SamFileReaders, and everything works with GATKSamRecord just extending BAMRecord. This results in up to a 2x performance improvement in writing BAMs and a ~10% improvement when reading BAMs files.
-- As a consequence of this, we no longer officially support SAM records. Attempting to create SAMRecord objects with the factory will throw a user exception.
-- Created a standard NGSPlatform enum, and GATKSamRecords support efficiently obtaining this value. The real BQSR (not the copy indel version) got the efficient code to use this. Please add all future platforms to this enum.
-- GATKSamRecord no longer supports using the OQ or defaultBaseQuality. This is performed in a wrapper iterator that's only added when these command line options are used.
-- ReducedRead code has been moved from ReadUtils until efficiency caching assessors in GATKSamRecord.
-- ArtificialSamUtils creates GATKSamRecords now, just SAMRecords. Added code here to create artifical pairs and using that code to create artificial ReadBackedPileups with specific properties
-- New smarter algorithm for FragmentPileup. This new code is up to 3x faster than the previous version, and is lazy so is more efficient when no overlapping pairs are actually in the pileup. Created extensive DataProvider driven UnitTest. Added Caliper-based benchmarking system to characterize the performance differences between the old and new algorithms. TODO still remains to make a efficient version that works for non-pileups for the HaplotypeCaller
2011-10-25 20:52:56 -04:00
Mark DePristo
2822f0dc27
Merge branch 'SamRecordFactory'
2011-10-25 20:34:47 -04:00
Mark DePristo
1b722c21cf
merge master
2011-10-25 16:08:39 -04:00
Ryan Poplin
56fdf0b865
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-25 15:58:56 -04:00
Ryan Poplin
4a34c1862e
misc cleanup. We now filter out haplotypes when it is obvious that the assembly has failed to find a parsimonious event rather than use haplotypes with large numbers of SNPs and small indels on them.
2011-10-25 15:22:28 -04:00
David Roazen
2794e5c1d4
Modified the VCFJarClassLoadingUnitTest to play nice with the packaged-jar test targets.
2011-10-25 14:47:15 -04:00
Guillermo del Angel
b559936b7a
a)New variant eval stratification module for indel size. b) Next iteration on indel caller runtime optimization: when computing likelihood of each haplotype for a given read, many computations will be redundant since pieces of haplotypes will be common to both REF and ALT haplotypes. So, we keep HMM matrices from one haplotype to the next one and recompute starting at the part where either haplotype is different or GOP/GCP are different.
2011-10-25 09:56:43 -04:00
Khalid Shakir
fac9932938
Embedding gsalib source and queueJobReport R scripts in the dist and package jars.
...
Moved gsalib and queueJobReport.R to embeddable namespaced locations.
Updated packager dependencies/dir to add an @includes which filters the embedded fileset.
RScriptExecutor can now JIT compiles the gsalib.
RScriptExecutor uses ProcessController and sends the Rscript output to java's stdout when run under -l DEBUG.
Refactored ProcessController and IOUtils from Queue to Sting Utils.
Added more unit tests to ProcessController along with a utility class to hard stop OutputStreams at a specified byte count.
Replaced uses of some IOUtils with Apache Commons IO.
ShellJobRunner refactored to use direct ProcessController and now kills jobs on shutdown.
Better QGraph responsiveness on shutdown by using Object.wait() instead of Thread.sleep().
2011-10-24 15:58:34 -04:00
Khalid Shakir
89a581a66f
Added ability to specify arguments in files via -args/--arg_file
...
Pushing back downsample and read filter args so they show up in getApproximateCommandLineArgs()
2011-10-24 15:58:34 -04:00
Mark DePristo
502592671d
Cleanup FragmentPileup before main repo commit
...
-- removed intermiate functions. Now only original version and best optimized new version remain
-- Moved general artificial read backed pileup creation code into ArtificialSamUtils
2011-10-24 14:40:05 -04:00
Mark DePristo
166174a551
Google caliper example execution script
...
-- FragmentPileup with final performance testing
2011-10-24 14:04:53 -04:00
Laurent Francioli
62477a0810
Added documentation and comments
2011-10-24 13:45:21 +02:00
Laurent Francioli
38ebf3141a
- Now supports parent/child pairs
...
- Sites with missing genotypes in pairs/trios are handled as follows:
-- Missing child -> Homozygous parents are phased, no transmission probability is emitted
-- Two individuals missing -> Phase if homozygous, no transmission probability is emitted
-- One parent missing -> Phased / transmission probability emitted
- Mutation prior set as argument
2011-10-24 12:30:04 +02:00
Laurent Francioli
7312e35c71
Now makes use of standard Allele and Genotype classes. This allowed quite some code cleaning.
2011-10-24 10:25:53 +02:00
Laurent Francioli
01b16abc8d
Genotype quality calculation modified to handle all genotypes the same way. This is inconsistent with GQ output by the UG but is correct even for cases of poor quality genotypes.
2011-10-24 10:24:41 +02:00
Mark DePristo
f6ccac889b
Merged bug fix from Stable into Unstable
2011-10-23 16:37:12 -04:00
Mark DePristo
585a45b7a3
Bug fix for ClipReadsWalker when stats output isn't provided
...
-- See http://getsatisfaction.com/gsa/topics/clipreadswalker?utm_content=topic_link&utm_medium=email&utm_source=reply_notification
2011-10-23 16:36:48 -04:00
Ryan Poplin
f5d910b8a5
Haplotype caller now sends genotype likelihoods to the exact model to genotype the events found in the best haplotypes.
2011-10-23 13:29:08 -04:00
Mark DePristo
42bf9adede
Initial version of "fast" FragmentPileup code
...
-- Uses mayOverlapRoutine in ReadUtils
-- Attempts to be smart when doing overlap calculation, to avoid unnecessary allocations
-- PileupElement now comparable (sorts on offset than on start)
-- Caliper microbenchmark to assess performance
2011-10-22 21:36:37 -04:00
Mauricio Carneiro
4913f8a60f
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-21 17:45:07 -04:00
Mauricio Carneiro
102dafdcbc
Validation of GATKSamRecord in read filters
...
Moved the validation of the GATKSamRecord to the MalformedReadFilter with the intent to make the read filter the ultimate validation location for sam records. This way we can opt to filter out malformed reads if we know what we are doing or blow up otherwise.
2011-10-21 17:40:43 -04:00
Guillermo del Angel
f4b409fa0d
CombineVariants bug fix: when merging records with disparate alleles we were leaving AC,AF fields intact. This had as a consequence that we could end up with a record with 3 alt alleles but only 2 values in AC,AF fields. Now, if alleles in combined vc are different from original, and if AC,AF fields can't be recomputed from genotypes, we remove attributes from vc map since they'll be invalid anyway. Integration test md5 changed since there were several badly merged records in result
2011-10-21 14:07:20 -04:00
Mark DePristo
b863390cb1
Moving reduced read functionality into GATKSAMRecord
...
-- More functions take / produce GATKSAMRecords instead of SAMRecord
2011-10-21 13:28:05 -04:00
Mark DePristo
2403e96062
Renamed GATKSamRecord -> GATKSAMRecord for consistency. Better docs.
2011-10-21 09:59:24 -04:00
Mark DePristo
110e13bc1e
Merge branch 'master' into SamRecordFactory
2011-10-21 09:43:52 -04:00
Mark DePristo
be797a8a1f
Recalibrator now uses the much more efficient NGSPlatform in the cycle covariates system
2011-10-21 09:39:21 -04:00
Mark DePristo
ed74ebcfa1
GATKSamRecords with efficiency NGSPlatform method
2011-10-21 09:38:41 -04:00
Mark DePristo
94e1898d8f
A canonical set of NGS platforms as enums with convenient manipulation methods
2011-10-21 09:37:45 -04:00
Laurent Francioli
edea90786a
Genotype quality is now recalculated for each of the phased Genotypes. Small problem is that we unnecessarily loose a little precision on the genotypes that do not change after assignment.
2011-10-20 17:04:19 +02:00
Laurent Francioli
1c61a57329
Original rewrite of PhaseByTransmission:
...
- Adapted to get the trio information from the SampleDB (i.e. from Pedigree file (ped)) => Multiple trios can be passed as argument
- Mendelian violations and trio phasing possibilities are pre-calculated and stored in Maps. => Runtime is ~3x faster
- Genotype combinations possible only given two MVs are now given a squared MV prior (e.g. 0/0+0/0=>1/1 is given 10^-16 prior if the MV prior is 10^-8)
- Corrected bug: In case the best genotype combination is Het/Het/Het, the genotypes are now set appropriately (before original genotypes were left even if they weren't Het/Het/Het)
- Basic reporting added:
-- mvf argument let the user specify a file to report remaining MVs
-- When the walker ends, some basic stats about the genotype reconfiguration and phasing are output
Known problems:
- GQ is not recalculated even if the genotype changes
Possible improvements:
- Phase partially typed trios
- Use standard Allele/Genotype Classes for the storage of the pre-calculated phase
2011-10-20 13:06:44 +02:00
Laurent Francioli
ef6a6fdfe4
Added getAsMap -> returns the likelihoods as an EnumMap with Genotypes as keys and likelihoods as values.
2011-10-20 12:49:18 +02:00
Laurent Francioli
76dd816e70
Added getParents() -> returns an arrayList containing the sample's parent(s) if available
2011-10-20 12:47:27 +02:00
Mark DePristo
999a8998ae
Constructor for GATKSamRecord with header only, for unit testing
2011-10-19 17:51:48 -04:00
Mark DePristo
3227143a1c
Systematic test code for FragmentPileup
...
-- Creates all combinatinos of overlapping and non-overlapping read pair pileups in all orientations and first/second pairings to validate fragment detection.
2011-10-19 17:50:27 -04:00
Mark DePristo
bba69701b5
Now creates GATKSamRecords now SamRecords
2011-10-19 17:49:17 -04:00
Christopher Hartl
cd8a6d62bb
You know how the wiki has a big section on commiting local changes to BRANCHES of the repository you clone it from? Yeah. It sucks if you don't do that.
...
This commit contains:
- IntronLossGenotyper is brought into its current incarnation
- A couple of simple new filters (ReadName is super useful for debugging, MateUnmapped is useful for selecting out reads that may have a relevant unaligned mate)
- RFA now matches my current local repository. It's in flux since I'm transitioning to the new traversal type.
+ the triggering read stash pilot required me to change the scope of some of the variables in the ReadClipping code, private -> protected. Those are all the changes there.
- MendelianViolation restored to its former glory (and an annotator module that uses the likelihood calculation has been added)
+ use this rather than a hard GQ threshold if you're doing MV analyses.
- Some miscellaneous QScripts
2011-10-19 17:42:37 -04:00
Mark DePristo
52345f0aec
Meaningful documentation string
2011-10-19 15:47:36 -04:00
Mark DePristo
1b38aa1a7e
Cleaning up reduced read code accessors
2011-10-19 15:46:44 -04:00
Eric Banks
d8d73fe4f2
Treat ./X genotypes as MIXED so that isHet, isHom, etc. still return the expected and correct values. Added docs to these accessors with contracts explicitly mentioned. Fixed case where NPE could be thrown.
2011-10-19 15:11:13 -04:00
Mark DePristo
7928b287fc
GATKSamRecord now produced by SAMFileReaders by default
...
-- Removed all of the unnecessary caching operations in GATKSAMRecord
-- GATKSAMRecord renamed to GATKSamRecord for consistency
2011-10-19 13:15:27 -04:00
Eric Banks
5a6468c11e
Allowing ./X genotypes and adding a unit test to ensure that this case is covered from now on (especially given that we may want to revert in the future). Reverting this change is really easy and entails uncommenting a few lines of code. But for now, despite Mark's objections, this case is allowed in the VCF spec and we are wrong not to allow it.
2011-10-19 11:52:05 -04:00
Eric Banks
48c4a8cb33
Make error messages clearer (even I was confused)
2011-10-19 11:49:16 -04:00
Eric Banks
6cadaa84c9
Just use validate() from super class since it does the same thing
2011-10-19 11:48:23 -04:00
Mark DePristo
df3e4e1abd
First working code to use SamRecordFactory to produce objects of our own design in SAMFileReader
2011-10-19 11:22:35 -04:00
Mauricio Carneiro
c27e2fb676
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-18 15:23:05 -04:00
Mark DePristo
f77f2eeb7d
Fix for new ID structure
2011-10-18 13:04:43 -04:00
Mark DePristo
1a92ee3593
No longer adds a binding of ID -> . when the ID field is dot in the VCF
...
-- Really we should make ID a primary key in VariantContext. Putting it into the attributes is just annoying now
2011-10-18 10:57:02 -04:00
Ryan Poplin
e45fcb66eb
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-17 15:56:19 -04:00
Ryan Poplin
1e6794c539
fixing typo in VariantsToTable docs
2011-10-17 15:56:02 -04:00
Mark DePristo
0de8550f17
Merged bug fix from Stable into Unstable
2011-10-17 15:29:53 -04:00
Mark DePristo
c1329c4dde
Fixing a binary to logical or
2011-10-17 15:29:45 -04:00
Mark DePristo
9e4963efc8
Merged bug fix from Stable into Unstable
2011-10-17 15:27:38 -04:00
Mark DePristo
ec911ce5bb
Even better error messages
2011-10-17 15:27:22 -04:00
Mark DePristo
d065bf1715
Merged bug fix from Stable into Unstable
2011-10-17 15:25:47 -04:00
Mark DePristo
a7cf9cdc67
Fixing error message typo
2011-10-17 15:25:35 -04:00
Ryan Poplin
589df6b7cf
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-17 14:35:14 -04:00
Ryan Poplin
6b02354d84
Adding a new getter in VariantsToTable to extract the indel event length.
2011-10-17 14:34:52 -04:00
Mark DePristo
3550798c4c
Merged bug fix from Stable into Unstable
2011-10-17 13:58:56 -04:00
Mark DePristo
4108a294f7
Better error message when a RodBinding file doesn't exist
2011-10-17 13:58:46 -04:00
Mark DePristo
cc76826f78
Merged bug fix from Stable into Unstable
2011-10-17 13:38:11 -04:00
Mark DePristo
09a09cacef
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/stable
2011-10-17 13:38:00 -04:00
Mark DePristo
fd4540cd32
Fixed extraordinarily subtle race condition with contracts invariant
...
-- all of the methods in the class must be synchronized or the internal state can be inconsistent with the contract invariant when entering the class in a non-synchronized method, even when that method doesn't care about the object's internal state
2011-10-17 13:37:55 -04:00
David Roazen
88d6b8bc1f
Merged bug fix from Stable into Unstable
2011-10-14 20:13:38 -04:00
David Roazen
bd8bb93811
Split RScriptExecutorUnitTest into public and private test classes.
...
We can't have a public test that depends on both public and private
code/data -- the new release system needs to do public-only tests,
and will catch this sort of thing.
2011-10-14 20:04:42 -04:00
David Roazen
4f01a742cb
Merged bug fix from Stable into Unstable
2011-10-13 21:39:52 -04:00
David Roazen
edfd6f8a06
Removing a public -> private dependency from the test suite.
...
The public integration test VariantContextIntegrationTest was dependent on the
private walker TestVariantContextWalker. Moved this walker to public/java/test
(NOT public/java/src, since this walker is only used by the test suite) to avoid
errors during public-only tests.
2011-10-13 21:32:52 -04:00
Mark DePristo
404ef741f1
Merged bug fix from Stable into Unstable
2011-10-13 18:02:06 -04:00
Mark DePristo
2ebdff074c
Update MD5s for SOLiD recalibration
...
-- MD5 db had spelling error; fixed
-- Bug in AlignmentUtils resulted in some bases not being color space corrected. The integration test caught the change, and it's clear that the new version is correct, as the prev. version was not considering the last the N qualities for reads with a ND operation.
2011-10-13 18:01:51 -04:00
Mark DePristo
5a881360df
Merged bug fix from Stable into Unstable
2011-10-13 15:54:43 -04:00
Mark DePristo
7cab6f6bb0
Bug fixes for thread unsafe simple timer and bad Ns treatment in AlignmentUtils
...
-- SimpleTimer is now threadsafe using synchronized method keywords
-- Bug fix for alignmentToByteArray() where the N case was refPos++ not the now correct refPos += elementLength
2011-10-13 15:53:12 -04:00
Mauricio Carneiro
e12ffb6547
Updating docs for GCContentByInterval
...
This walker does not take any BAMs. It only walks over the reference.
2011-10-13 13:27:00 -04:00
Eric Banks
9aecd50473
Adding ability to exclude annotations from the VA and UG lists. As described in the docs, this argument trumps all others (including -all) so that we can get around the SnpEff issue brought up by Menachem. Added integration test for it.
2011-10-12 15:44:54 -04:00
Mauricio Carneiro
e53a952aeb
Added ION Torrent support to CountCovariates.
2011-10-12 01:57:02 -04:00
Mauricio Carneiro
a2733a451f
Added NotCalled feature to GAV
...
Added "not called" and "no status" to the truth table. Very useful.
2011-10-11 19:31:45 -04:00
David Roazen
ae83420637
Merged bug fix from Stable into Unstable
2011-10-11 12:26:08 -04:00
David Roazen
794f275871
SnpEff is now marked as a RodRequiringAnnotation instead of an ExperimentalAnnotation.
...
Having SnpEff grouped with the Experimental annotations was proving problematic, since it
requires a rod. Placing it in its own group should improve the situation somewhat, making it
easier to request "all annotations except for SnpEff".
2011-10-11 12:08:56 -04:00
David Roazen
cfd0ac8410
Merged bug fix from Stable into Unstable
...
Conflicts:
public/java/test/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedGenotyperIntegrationTest.java
2011-10-11 12:03:51 -04:00
David Roazen
24b72334b3
UnifiedGenotyper now correctly initializes the VariantAnnotator engine.
...
This allows the annotation classes to perform any necessary initialization/validation.
For example, it allows the SnpEff annotator to (among other things) validate its rod binding.
This will prevent a NullPointerException when SnpEff annotation is requested but no rod binding
is present.
Added an integration test to cover this case so that it doesn't break again.
2011-10-11 12:02:05 -04:00
Guillermo del Angel
0429b38021
Merged bug fix from Stable into Unstable
2011-10-11 11:19:38 -04:00
Guillermo del Angel
1c485d8b5e
Forgot that no matter how trivial a change it's a good idea to compile first
2011-10-11 11:18:41 -04:00
Guillermo del Angel
6418f4d69b
Merged bug fix from Stable into Unstable
2011-10-11 11:13:18 -04:00
Guillermo del Angel
1975de1b32
Second try: hide --do_indel_quality in AnalyzeCovariates
2011-10-11 11:11:29 -04:00
Guillermo del Angel
6506ea83e8
Revert "Hide --do_indel_quality argument in AnalyzeCovariates. This shouldn't be documented nor used by external users"... a hidden passenger change made it through.
...
This reverts commit 70e10ccb1be90dcff8f4485ae6ee036db2d1ac86.
2011-10-11 11:03:12 -04:00
Guillermo del Angel
4c1d8c8d44
Hide --do_indel_quality argument in AnalyzeCovariates. This shouldn't be documented nor used by external users
2011-10-11 11:01:06 -04:00
Eric Banks
77c983c5b5
No one claimed this walker and it doesn't have integration tests or GATKdocs so it doesn't belong in public.
2011-10-10 15:17:54 -04:00
Mark DePristo
fb72bcf732
DiffObjects no longer prints out the file name in the status so MD5 are stable
2011-10-10 15:10:57 -04:00
Mark DePristo
e3ff4f4266
Failing MD5 because output now contains absolute path
2011-10-10 11:05:02 -04:00
Mark DePristo
3e6c16d961
CombineVariants preserves allele order
2011-10-10 11:04:38 -04:00
Mark DePristo
a4bb842958
RankSum tests have lightly different MD5 results based on allele order
...
-- UG GENOTYPE_GIVEN_ALLELES now uses the order of alleles in the VCF, so this changes the MD5
2011-10-10 11:04:07 -04:00
Mark DePristo
46e7370128
this.allele, getAlleles(), and getAltAlleles() now return List not set
...
-- Changes associated code throughout the codebase
-- Updated necessary (but minimal) UnitTests to reflect new behavior
-- Much better makealleles() function in VC.java that enforces a lot of key constraints in VC
2011-10-09 11:45:55 -07:00
Mark DePristo
822654b119
UnitTests for allele getting functions in VC in prep for move from set to list
2011-10-09 10:36:14 -07:00
Mark DePristo
c67f6c076b
simpleMerge now preserves allele order
...
-- UnitTests for dangerous PL merging cases in the multi-allelic case. The new behavior is correct
2011-10-08 17:39:53 -07:00
Mark DePristo
e94e6ba101
A UnitTest to ensure that the order of alleles is maintained
...
-> A, C, T and A, T, C are different and must be maintained. The constructors were doing this appropriately, so nothing needed to be changed
2011-10-08 08:47:58 -07:00
Mark DePristo
ec14a4a606
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-07 08:38:50 -07:00
Matt Hanna
6fbd41724a
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-07 11:20:00 -04:00
Matt Hanna
4514bc350f
More reliable way of finding the Tribble jar.
2011-10-07 11:19:29 -04:00
Eric Banks
181c76750e
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 22:38:55 -04:00
Eric Banks
ca9cd9b688
Minor fix for merging intervals which hadn't been necessary when only merging from the left to right. Added integration tests to cover the parallelization of RTC.
2011-10-06 22:38:44 -04:00
Khalid Shakir
f91b015e0e
Made the BaseTest.testDir absolute
2011-10-06 22:33:21 -04:00
Mark DePristo
c7864c7256
Filter application order is now deterministic, in the order defined by the walker
...
-- For no apparent reason we were using a HashSet to store the ReadFilters, so the order of operations was really arbitrarily applied. The order now is
(1) the order of the walker intrinsic filters
(2) read group black list (if provided)
(3) command line filters (if provided)
2011-10-06 18:51:40 -07:00
Mark DePristo
0b88af4af9
Counts of records failing filters are displayed sorted
...
-- Stops random ordering of the output, as the counts are returned sorted by string name of the class
-- Deleted now unused sh*tty assessors in Utils
2011-10-06 18:42:26 -07:00
Mark DePristo
d1e70d6ec2
Removed Nx counting of reads in metrics with -nt > 1
2011-10-06 18:29:26 -07:00
Eric Banks
c61804a450
Rename the long version of the argument name to more accurately reflect its purpose.
2011-10-06 16:14:04 -04:00
Eric Banks
61a3dfae24
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 15:58:04 -04:00
Eric Banks
6eb87bf58a
RTC now caches all intervals as GenomeLocs (which is expected to take < 1Gb whole genome based on back of the envelope calculations with Matt) so that 1) we don't have to worry about emitting outside of the leaves in the hierarchical reductions and 2) we can emit the intervals in sorted order which is a big performance plus for the realigner. Integration tests change only because intervals whose start=stop are now printed as chr:start instead of chr:start-stop.
2011-10-06 15:57:49 -04:00
Mark DePristo
6d9c210460
Updating MD5s for updated BAM with read groups
2011-10-06 12:15:48 -07:00
Mark DePristo
ab357ef900
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 10:50:02 -07:00
Eric Banks
1b0735f0a3
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 13:41:45 -04:00
Eric Banks
c4dfc1fb8b
Temporary commit of parallelization support for RealignerTargetCreator. Tim begged us for this and I got assurances from Khalid/Matt that this would also be extremely helpful for the whole genome calling pipeline, so I spent a while working on this. Needs to be fixed up though because apparently only the leaves in the hierarchical reduce get their output aggregated. Worked out a better solution with Matt.
2011-10-06 13:41:36 -04:00
Matt Hanna
3961733590
Merged bug fix from Stable into Unstable
2011-10-06 12:54:52 -04:00
Matt Hanna
4fa5045e84
Abandoning classfileset/rootfileset approach due to difficulting managing
...
classloading of bcel*.jar/ant-apache-bcel*.jar. Switching instead to manually
specifying a minimal set of packages/classes to include in the vcf.jar via
build.xml, and adding a unit test which creates a limited classloader
only aware of vcf.jar and tribble.jar and tries to use it to load the core
classes in the vcf jar.
Hopefully third time's the charm.
2011-10-06 12:49:51 -04:00
Mark DePristo
73f9d1f217
GATK read group requirement iron hand
...
-- The GATK will now throw a user exception if it opens a SAM/BAM file that doesn't have at least one RG defined
-- LIBS again throws an error if the complete list of samples isn't provided
-- Updating ExmpleCountLociPipeline test to use the well-formated versions of the exampleBAM and exampleFASTA files in testdata, instead of the old broken ones in validation_data.
-- Convenience constructors for UserExceptions.MalformedBAM
2011-10-06 08:40:35 -07:00
Mark DePristo
23845ac798
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 08:17:08 -07:00
Mark DePristo
4b5b9155a9
Fixed bad expected value in PedReaderUnitTest
2011-10-06 08:16:47 -07:00
Mark DePristo
daa5999489
Fixed typo in argument description
2011-10-06 08:16:25 -07:00
Guillermo del Angel
8a474e38ff
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-06 10:08:39 -04:00
Guillermo del Angel
93f7e632bd
Minor fix/enhancement for VariantEval: if a vcf has symbolic alleles, program would crash ungracefully - now we'll just skip record without processing. This is a big issue since we can't process 1000G integration files with code as is.
2011-10-06 10:07:46 -04:00
Mark DePristo
190be4d0d1
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-10-05 21:27:11 -07:00
Mark DePristo
8e6845806a
Allowing empty samples list in LIBS
...
-- Right now we cannot process BAM files without read groups because we enforce the samples list to not be empty when there's a SAM record. Now if there are reads and there are no samples we add the "null" sample so that LIBS walks the reads properly
2011-10-05 21:26:21 -07:00
Matt Hanna
180c8f286f
Merged bug fix from Stable into Unstable
2011-10-05 20:37:43 -04:00
Matt Hanna
55b9f06527
Ensure that IndelRealigner n-way out option supports MD5 generation.
2011-10-05 20:36:28 -04:00
Mark DePristo
be2d29ce69
Final PED documentation
2011-10-05 15:17:41 -07:00
Mark DePristo
3226d5dc0d
Merge branch 'master' into ped
2011-10-05 15:03:09 -07:00
Mark DePristo
6a573437af
Details documentation arguments for -ped
2011-10-05 15:00:58 -07:00
Mark DePristo
e7c80f7c45
Renaming quantitative trait to OtherPhenotype which is now a String not a double
...
-- we can now use PED file to represent population data or other arbitrary phenotype data, not just doubles
2011-10-05 12:26:33 -07:00
Mark DePristo
51ecc20867
getFamily() and associated methods implemented and tested
...
-- Sample no longer serializable
-- Sample now implements Comparable
2011-10-05 09:55:05 -07:00
Mark DePristo
f4bac58f14
Merged bug fix from Stable into Unstable
2011-10-04 21:00:34 -07:00
Mark DePristo
d1d39943d0
Updating MD5 for BAMs that I added a read group to, part 2
2011-10-04 21:00:15 -07:00
Mark DePristo
9bd3ba4c7e
Missed one MD5
2011-10-04 16:04:52 -07:00
Mark DePristo
ffdfdcde3f
Updating MD5s
...
-- Interval test now uses RG containing BAM
-- DoC sample name ordering has changed.
2011-10-04 15:54:45 -07:00
Mark DePristo
a45d985818
TODO method stubs
2011-10-04 15:54:09 -07:00
Mark DePristo
463eab7604
All MD5 mismatches for test are shown
...
-- Now for tests like DoC, with 20 output md5s, you see all of the differences before failing.
2011-10-04 15:53:52 -07:00
Mark DePristo
c642a080d4
Merged bug fix from Stable into Unstable
2011-10-04 14:08:41 -07:00
Mark DePristo
941317167e
Updating MD5 for BAMs that I added a read group to
2011-10-04 14:08:00 -07:00
Mark DePristo
e1d6c7a50a
Updating MD5 that have changed due to sample ordering differences
2011-10-04 09:33:23 -07:00
Mark DePristo
343a7b6b2f
Updating UG integration tests for arbitrary impact of sample order changes on downsampling
2011-10-04 08:14:00 -07:00
Mark DePristo
fee89e47ff
Only throws an error when there are no samples but there are reads
...
-- Handles the case when you are running a ROD traversal and yet the LIBS is still used to return null everywhere.
2011-10-04 06:50:54 -07:00
Mark DePristo
f552aede42
Only provide the sample names in the BAM file for efficiency
2011-10-04 06:50:12 -07:00
Mark DePristo
a27641e1fc
Cleaned up imports
2011-10-04 06:28:36 -07:00
Mark DePristo
b20689ff55
No longer supports extraProperties
...
-- the underlying data structure is still present, but until I decide what to do for the extensible system I've completely disabled the subsystem
-- Added code to merge Samples, so that a mostly full record can be merged with a consistent empty record. If the two records are inconsistent, an error is thrown
-- addSample() in Sample.class now invokes mergeSample() when appropriate
-- Validation types are now only STRICT or SILENT
-- Validation code implemented in SampleDBBuilder
-- Extensive unit tests for SampleDBBuilder
2011-10-03 19:20:33 -07:00
Mark DePristo
867a7476c1
Systematic unit tests for the sample object
2011-10-03 19:09:02 -07:00
Mauricio Carneiro
3837aa45b4
Fixing conflicts
...
Conflicts:
public/java/test/org/broadinstitute/sting/utils/clipreads/ReadClipperUnitTest.java
2011-10-03 19:07:59 -07:00
Mark DePristo
2e3dc52088
Minor function renaming
2011-10-03 14:41:13 -07:00
Mark DePristo
dd71884b0c
On path to SampleDB engine integration
...
-- PedReader tag parser
-- Separation of SampleDBBuilder from SampleDB (now immutable)
-- Removed old sample engine arguments
2011-10-03 12:08:07 -07:00
Eric Banks
c3eff7451a
Found a small inefficiency while profiling: we were still using String.split instead of ParsingUtils.split to break up array values in the INFO field. There was a noticeable (albeit not big) difference in the change when reading sites only files.
2011-10-03 14:20:39 -04:00
Mark DePristo
8ee0f91904
Remove residual processing tracker arguments
2011-10-03 09:50:01 -07:00
Mark DePristo
89ac50e86e
SampleDataSource -> SampleDB
2011-10-03 09:33:30 -07:00
Mark DePristo
93fba06cb5
Support for whitespace only lines
2011-10-03 09:30:10 -07:00
Mark DePristo
0604ce55d1
PedReader support for ; separated lines, not only newline
2011-10-03 09:19:58 -07:00
Mark DePristo
52f670c8b8
100% version of PedReader
...
-- Passes all unit tests
-- Added unit tests for missing fields
2011-10-03 06:12:58 -07:00
Roger Zurawicki
bf6a3a6532
Added framework to do batch CigarClip Testing
...
*NOTE: This commit has not been compiled!
2011-10-02 22:33:46 -04:00
Mark DePristo
dd75ad9f49
95% PedReader
...
-- Passes significiant unit tests
-- Implicit sample creation for mom / dad when you create single samples
-- Continuing cleanup of Sample and SampleDataSource
2011-09-30 18:03:34 -04:00
Andrey Sivachenko
c7898a9be7
inconsequential change in string constants printed into the vcf which noone uses anyway...
2011-09-30 16:40:21 -04:00
Mark DePristo
010899f886
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-30 15:51:09 -04:00
Mark DePristo
84160bd83f
Reorganization of Sample
...
-- Moved Gender and Afflication to separate public enums
-- PedReader 90% implemented
-- Improve interface cleanup to XReadLines and UserException
2011-09-30 15:50:54 -04:00
Mauricio Carneiro
05fba6f23a
Clipping ends inside deletion and before insertion
...
fixed.
2011-09-30 15:44:43 -04:00
Mark DePristo
c1cf6bc45a
PEDReader should be in samples
2011-09-30 14:22:19 -04:00
Mark DePristo
56f10b40a8
Fixing test bugs for WindowMaker that required empty sample list
2011-09-30 14:18:27 -04:00
Ryan Poplin
af6c053435
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-30 13:33:31 -04:00
Mark DePristo
810e8ad011
Removed getXByReaders() function from the engine
...
-- These could be simplied in their downstream uses
-- Or they could be replaced with a generic getSAMFileHeaders() function and then apply the getSamples(header) as desired downstream
2011-09-30 10:43:51 -04:00
Mark DePristo
178ba24c27
Move getSamplesForSamFile to SampleUtils
...
-- A nearly identical piece of code already lived in SampleUtils. Now there are two functions, one taking a regular header and another grabbing the merged header from the GATK engine itself. Much cleaner
2011-09-30 10:28:18 -04:00
Mark DePristo
30d23942b1
Renamed ReadBackedPileup getXSampleName() functions to getXSample
...
-- now that we don't have Sample objects floating around we don't have to have all of the Name extensions on our functions
2011-09-30 10:02:57 -04:00
Mark DePristo
3289a325fc
Removed final use of Sample in RBP
2011-09-30 09:57:39 -04:00
Mark DePristo
a69a4dda2f
SamplesDB no longer has null sample
...
-- Updated getSamples().size() == 2 test in CallableLociWalker that really ensured there was one sample in the system
2011-09-30 09:56:23 -04:00
Mark DePristo
e055a78f6e
LIBS now requires at least one sample be present
...
-- UnitTest provides a "null" sample for matching the reads without read groups
2011-09-30 09:49:35 -04:00
Mark DePristo
9860a2c989
Merge branch 'master' into ped
2011-09-30 09:28:18 -04:00
Mark DePristo
d901fed617
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-30 08:41:44 -04:00
Mauricio Carneiro
cabacf028d
Intermediate commit to fix interval skipping
...
may need additional testing.
2011-09-29 18:45:12 -04:00
Mark DePristo
b71b51751e
Bug fix for UnitTest
...
-- Provide the null sample to the LIBS, as this seems to be required for correctly passing this unit test
-- Will be fixed in a future update
2011-09-29 17:30:01 -04:00
Mark DePristo
1765fbeb6b
Merge branch 'master' into ped
2011-09-29 17:18:51 -04:00
Mark DePristo
98ecaf8aa0
Support for ReducedReads with reduced counts and average quals
...
-- ReadUtils and UnitTest updated to support new byte[] style
-- Removed unnecessary read transformer in PairHMM
2011-09-29 17:18:39 -04:00
Mauricio Carneiro
9508220157
fixed hard clipping both ends inside deletion
...
If both ends of the interval falls within a deletion in the read then hardClipBothEnds would cut the right tail first including the entire deletion, then fail to cut the left tail because there would not be any bases there anymore. Fixed.
2011-09-29 15:36:49 -04:00
Mark DePristo
9458f01409
Test cleanup of Sample object
2011-09-29 15:13:05 -04:00
Mark DePristo
625ffb6a07
LocusIteratorByState and ReadBackedPileups no long use Sample
2011-09-29 14:52:11 -04:00
Mark DePristo
b3a2371925
Merge branch 'master' into ped
2011-09-29 14:32:17 -04:00
Mark DePristo
68761a6e28
Removed sample from header
2011-09-29 14:13:05 -04:00
Mauricio Carneiro
a5e75cd14c
Outputting both consensus base qualities and counts
...
The base qualities of a consensus reads are now the average quality of the bases forming the consensus base (most common base) and the consensus quality tag now carry an array with the counts of each base in the consensus. This should increase file size but improve calling sensitivity/specificity.
2011-09-29 12:54:41 -04:00
Mark DePristo
505416b6c0
Merge branch 'master' into ped
2011-09-29 12:22:39 -04:00
Mauricio Carneiro
4086fa768f
Disabling all ReadClipperUnitTests
2011-09-29 12:20:35 -04:00
Mark DePristo
9536845e35
Cleaning up unused code in MV
2011-09-29 12:20:07 -04:00
Mark DePristo
5043d76c3d
Removing more bad uses of SampleDataSource creation
2011-09-29 12:16:34 -04:00
Mark DePristo
5c9227cf5e
Further cleanup of Sample database
...
-- Removing more and more unnecessary code
-- Partial removal of type safe Sample usage. On the road to SampleDB only
2011-09-29 11:50:05 -04:00
Mark DePristo
2a0cd556d3
Further cleanup of Sample
...
-- Cleaned up interface functions in GAE
-- Added Walker.getSampleDB() function which is an easier option for tools to get the samples db
2011-09-29 10:34:51 -04:00
Mark DePristo
e76f381628
Moved sample package from DataSources to gatk, and renamed it samples
...
-- All associated changes to the codebase are just header updates
2011-09-29 09:57:15 -04:00
Mark DePristo
e197dcd1f3
Pre-cleanup commit of Sample and SampleDataSource
...
-- SampleDataSource has all reader functionality disabled
2011-09-29 09:44:18 -04:00
Mark DePristo
4d31673cc5
No longer supporting YAML file allows us to delete 75% of the sample's codebase
2011-09-29 09:43:31 -04:00
Ryan Poplin
e366ee18bc
Adding ability to read in and make use of kmer quality tables during HMM evaluation
2011-09-29 07:46:19 -04:00
Mauricio Carneiro
fc86cd6fd8
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/carneiro/gatk/RR into rr
2011-09-29 00:12:15 -04:00
Roger Zurawicki
4fd5630f6a
Added ReadClipper Unit Test
...
* Includes tests that include HardClip to Read and Reference Coords.
* Changed ReadUtils.HardClipByReferenceCoordinates from private to protected to allow for testing
2011-09-28 23:13:50 -04:00
Matt Hanna
9272ed03b5
Merged bug fix from Stable into Unstable
2011-09-28 21:26:43 -04:00
Matt Hanna
0acaf2df65
Fix an embarrassing issue where a specific configuration of minimal coverage
...
over small intervals could cause reads to be dropped from the pileup. Nothing
to see here...
2011-09-28 21:23:01 -04:00
Guillermo del Angel
c8d3a720f9
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-28 18:17:34 -04:00
Guillermo del Angel
7e3cb45093
Further performance optim in banded hmm, about 60% speed improvement over current implementation now
2011-09-28 16:27:28 -04:00
Ryan Poplin
1b1ca80df2
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-28 16:17:39 -04:00
Ryan Poplin
3b73dc89fe
Making several esoteric arguments in the BQSR @Hidden. Adding basic support for Complete Genomics machine cycle.
2011-09-28 16:17:31 -04:00
Mauricio Carneiro
ff2f4df043
Fixed hardclipping inside indel (right tail)
...
when hard clipping the right tail of a read falls inside a deletion, clipping should fall back to the last base before the deletion to follow the ReadClipper's contract.
2011-09-28 16:07:34 -04:00
Mauricio Carneiro
3c7b7f74ef
Optimized interval iteration
...
Using a TreedSet to manipulate getToolkit.getIntervals() and being smart about which intervals to test makes interval clipping O(1) instead of O(n).
2011-09-28 16:07:34 -04:00
Mauricio Carneiro
5c9b659c02
clipping both ends of the reads was modifying the original read
...
This goes against the ReadClipper contract, and was affecting the second part of the read that spans over multiple intervals. Fixed.
2011-09-28 16:07:34 -04:00
Guillermo del Angel
fe23e4d10c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-28 15:53:11 -04:00
Guillermo del Angel
e2b9030e93
First mostly fully functional implementation of banded pair HMM likelihood computation for indel caller. More experimentation to follow but it right now works in small data sets and at least it doesn't break existing things. Disabled by default at this point
2011-09-28 15:51:48 -04:00
Eric Banks
1b45f21774
Removing this command-line tool. Purposely not doing this in stable so that users who may still use it have time to find other options. But the docs are no longer on the wiki.
2011-09-28 13:18:32 -04:00
Eric Banks
1f0e354fae
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-28 13:13:21 -04:00
Eric Banks
bb619a9a3c
Fixing docs
2011-09-28 13:13:03 -04:00
Mark DePristo
5812004e06
Merge branch 'stable'
2011-09-28 11:36:40 -04:00
Mark DePristo
a5006831d7
Shows "" not empty space when default string value is ""
2011-09-28 11:35:52 -04:00
Mark DePristo
1e32281a15
Fix to not show -null when missing short name argument
2011-09-28 11:31:20 -04:00
Mauricio Carneiro
89544c209c
Fixing contracts
...
changed return type to Pair, changing contracts accordingly.
2011-09-28 11:19:17 -04:00
Eric Banks
eacbee3fe5
Merged bug fix from Stable into Unstable
2011-09-27 20:35:18 -04:00
Eric Banks
43b0c98298
Fix docs
2011-09-27 20:34:46 -04:00
Eric Banks
232a6df11c
Add longhand form to the error message.
2011-09-27 20:29:31 -04:00
Eric Banks
1d6fcb6eb1
Revert "Add longhand form to the error message to prevent users from posting borderline dumb posts to GS."
...
This reverts commit 75b2600527cfce05ae683cb394290ff2a80e8552.
2011-09-27 20:27:00 -04:00
Eric Banks
269b9826b6
Add longhand form to the error message to prevent users from posting borderline dumb posts to GS.
2011-09-27 20:26:36 -04:00
Mauricio Carneiro
3b6e43b7c4
Use reads that span multiple intervals
...
* RR will now compress reads that span across multiple intervals correctly and output them in the correct order.
* Fixed bug in getReadCoordinateForReferenceCoordinate where if the requested reference coordinate fell inside a deletion in the read the read would be clipped up to one element past the deletion.
2011-09-27 18:39:06 -04:00
Khalid Shakir
84bd355690
Merged bug fix from Stable into Unstable
2011-09-27 14:34:39 -04:00
Khalid Shakir
b090751f62
Fixed Ant / PluginManager issue where reflections was picking up all class files under current working directory due to "." in jar manifest classpaths.
...
Updates to HybridSelectionPipeline:
- Added annotations back via snpEff
- Minor updates to VQSR paths and lowered memory
2011-09-27 14:33:57 -04:00
Eric Banks
26e71f6688
The Omni files have multiple records (with the same ALT) at a particular location, with one PASSing and the other(s) filtered. Chris, this is why using this file as both eval and comp leads to ref/no-call cells in the GenotypeConcordance table. However, this led to non-determinism in VE because the VCs were placed in a HashSet; we use a LinkedHashMap instead to bring back determinism.
2011-09-27 11:03:17 -04:00
Guillermo del Angel
ceffefa6a6
Intermediate version with banded pair HMM
2011-09-27 10:18:58 -04:00
Mark DePristo
e99ff3caae
Removed lots of old, and not to be used, HMM options
...
-- resulted in massive code cleanup
-- GdA will integrate his new banded algorithm here
-- Removed: DO_CONTEXT_DEPENDENT_PENALTIES, GET_GAP_PENALTIES_FROM_DATA, INDEL_RECAL_FILE, dovit, GSA_PRODUCTION_ONLY
2011-09-27 10:08:40 -04:00
Mark DePristo
fa0efbc4ca
Refactoring of PairHMM to support reduced reads
2011-09-26 13:28:56 -04:00
Mark DePristo
a6b65d6347
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-26 13:26:21 -04:00
Mark DePristo
4f09453470
Refactored reduced read utilities
...
-- UnitTests for key functions on reduced reads
-- PileupElement calls static functions in ReadUtils
-- Simple routine that takes a reduced read and fills in its quals with its reduced qual
2011-09-26 12:58:31 -04:00
Eric Banks
234b74dd05
Merged bug fix from Stable into Unstable
2011-09-26 11:47:23 -05:00
Eric Banks
317b95fa57
Fixing some annotator docs
2011-09-26 11:46:45 -05:00
Mauricio Carneiro
b76dbc72f0
Fixed interval navigation bug.
...
If a read was hard clipped away from the current interval, all subsequent reads within that interval (not hardclipped) would be filtered out. Fixed.
2011-09-26 08:13:44 -04:00
Guillermo del Angel
9afccd11b1
Minor refactoring: add ability to MathUtils.normalizeFromLog10 to not go to linear domain but just substract max value from log values and return. Use this function in snp and indel GL computation.
2011-09-25 21:18:56 -04:00
Guillermo del Angel
3eef800889
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-24 21:20:11 -04:00
Guillermo del Angel
4707ab4a7d
Added unit tests to test genotype merges with PL's
2011-09-24 21:17:15 -04:00
Guillermo del Angel
203517fbb7
a) Cleanups/bug fixes to previous commit to CombineVariants.
...
b) Change md5 to reflect records that are now merged correctly.
c) Change unit merge alleles test to reflect the fact that a null non-variant vc object is not valid and not supported because there's no way to codify such object in a vcf. The code correctly converts this to a non-variant single-base event with whatever the reference is at that location.
2011-09-24 19:08:00 -04:00
Mauricio Carneiro
c31f4cb2f6
Cleaning leading insertions
...
With the current implementation, a read cannot start with a deletion or an insertion. Maybe this will change in the future, but for now, chop the leading insertion off.
2011-09-24 14:33:32 -04:00
Guillermo del Angel
cd058dd10f
a) Fixed md5 for legit change in UG output that now also no-calls genotypes w/0,0,0 in PL's in SNP case.
...
b) First reimplementation of new vc merger of different types. Previous version did it in two steps, first merging all vc's per type and then trying to see if resulting vc's would be merged if alleles of one type were a subset of another, but this won't work when uniquifying genotypes since sample names would be messed up and GT sample names wouldn't match VC sample names. Now, it's actually simpler: when splitting vc's by type before merging, we check for alleles of one vc being a subset of alleles of vc of another type and if so we put them together in same list.
2011-09-24 13:40:11 -04:00
Mark DePristo
bb11951255
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-24 09:26:45 -04:00
Mark DePristo
8d9e136bba
Merge branch 'stable'
2011-09-24 09:26:28 -04:00
Mark DePristo
6804ab6d2f
Bug fix for NPE in very short GATK runs
...
-- Was already in unstable, but not stable...
2011-09-24 09:25:29 -04:00
Mark DePristo
92acff46e5
Moved Haplotype into Utils root
2011-09-24 09:14:05 -04:00
Mark DePristo
f792353dcd
Framework for genotype unit test
2011-09-24 08:56:45 -04:00
Mark DePristo
c0bb0cb465
Make DiploidGenotype enum private to walkers.genotyper
2011-09-24 08:48:33 -04:00
Guillermo del Angel
3a4469a236
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-23 21:58:34 -04:00
Guillermo del Angel
0e74cc3c74
a) Treat SNP genotype likelihoods just as indels, in the sense that they're always normalized as PL's so one of them will always be zero. This creates minor numerical differences in Qual and annotations due to numerical approximations in AF computation.
...
b) Intermediate CombineVariants fixes, not ready yet
2011-09-23 21:58:20 -04:00
Khalid Shakir
1803bd6ae2
Merged bug fix from Stable into Unstable
2011-09-23 21:05:00 -04:00
Khalid Shakir
8ceb93b8ac
Fixed an integration test which crashed on the out of date LSF DRMAA library when run against the obsolete LSF dotkit instead of .combined_LSF_SGE
2011-09-23 21:03:22 -04:00
Mauricio Carneiro
7cac75ae1d
Merged bug fix from Stable into Unstable
2011-09-23 19:00:43 -04:00
Mauricio Carneiro
fbe3c1e0b3
Adding warning on HardClipping
...
Hard Clipping is still under heavy development and should not be used by anyone less prepared than MacGyver.
2011-09-23 19:00:19 -04:00
Mark DePristo
b66841f179
Static cache for binomial probability
...
-- Very low level performance optimization
2011-09-23 17:29:34 -04:00
Mauricio Carneiro
1a45c331b2
bringing the latest bug fixes to Reduce Reads
2011-09-23 16:40:06 -04:00
Mauricio Carneiro
9ea40f2e41
Deletions/Insertions in hard clip and bug fixes
...
* Deletions now count as hard clipped bases in order to recover the original alignment start of a clipped read.
* Insertions do not count as hard clipped bases for the same reason.
* This created a bug in the previous cigar cleaning function. Fixed.
2011-09-23 16:37:08 -04:00
David Roazen
40202c85e0
Merged bug fix from Stable into Unstable
2011-09-23 16:35:55 -04:00
David Roazen
e1cb5f6459
SnpEff annotator now assigns a functional class to each effect and distinguishes between actual effects and mere modifiers.
...
-We now assign a functional class (nonsense, missense, silent, or none) to each SnpEff effect, and add a
SNPEFF_FUNCTIONAL_CLASS annotation to the INFO field of the output VCF.
-Effects are now prioritized according to both biological impact and functional class, instead of impact only.
-Many of SnpEff's "low-impact" effects are now classified as "modifiers" with lower priority than every
other effect. This includes such "effects" as DOWNSTREAM, UPSTREAM, INTRON, GENE, EXON, and others that
really describe the location of the variant rather than its biological effect.
This code will be short-lived (likely 1.2-only), as the next version of SnpEff will include most of these
features directly.
Checking this change into Stable+Unstable instead of Unstable because the current functional class stratification
in VariantEval is basically broken and urgently needs to be fixed for production purposes.
2011-09-23 16:06:52 -04:00
Matt Hanna
e388c357ca
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-23 14:53:28 -04:00
Matt Hanna
cc23b0b8a9
Fix for recent change modelling unmapped shards: don't invoke optimization to combine mapped and unmapped shards.
2011-09-23 14:52:31 -04:00
Mark DePristo
e3d4efb283
Remove N2 EXACT model code, which should never be used
2011-09-23 11:55:21 -04:00
Mark DePristo
27ce3c822e
Merge branch 'stable'
2011-09-23 09:04:52 -04:00
Mark DePristo
2bb77a7978
Docs for all VariantAnnotator annotations
2011-09-23 09:04:16 -04:00
Mark DePristo
dd65ba5bae
@Hidden for DocumentationTest and GATKDocsExample
2011-09-23 09:03:37 -04:00
Mark DePristo
dfce301beb
Looks for @Hidden annotation on all classes and excludes them from the docs
2011-09-23 09:03:04 -04:00
Mark DePristo
106a26c42d
Minor file cleanup
2011-09-23 08:25:20 -04:00
Mark DePristo
a9f073fa68
Genotype merging unit tests for simpleMerge
...
-- Remaining TODOs are all for GdA
2011-09-23 08:24:49 -04:00
Mark DePristo
4397ce8653
Moved removePLs to VariantContextUtils
2011-09-23 08:24:20 -04:00
Eric Banks
a8e0fb26ea
Updating md5 because the file changed
2011-09-23 07:33:20 -04:00
Mark DePristo
c49cc623de
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-22 17:26:21 -04:00
Mark DePristo
dab7232e9a
simpleMerge UnitTest for not annotating and annotating to different info key
2011-09-22 17:26:11 -04:00
Mark DePristo
30ab3af0c8
A few more simpleMerge UnitTest tests for filtered vcs
2011-09-22 17:14:59 -04:00
Mark DePristo
5cf82f9236
simpleMerge UnitTest tests filtered VC merging
2011-09-22 17:05:12 -04:00
Mark DePristo
46ca33dc04
TestDataProvider now can be named
2011-09-22 17:04:32 -04:00
Mauricio Carneiro
96c875399c
Merging many bug fixes to reduce reads
2011-09-22 17:04:11 -04:00
Mauricio Carneiro
39b54211d0
Fixed hard clipping soft clipped bases after hard clips
...
if soft clipped bases were after a hard clipped section of the read, the hard clip was clipping the left soft clip tail as if it were a right tail. Mayhem.
2011-09-22 15:46:55 -04:00
Mark DePristo
68da555932
UnitTest for simpleMerge for alleles
2011-09-22 15:16:37 -04:00
Mauricio Carneiro
1acf7945c5
Fixed hard clipped cigar and alignment start
...
* Hard clipped Cigar now includes all insertions that were hard clipped and not the deletions.
* The alignment start is now recalculated according to the new hard clipped cigar representation
2011-09-22 14:51:14 -04:00
Eric Banks
80d7300de4
Unit test was passing in FORMAT as one of the sample names. There used to be a hack in the VCFHeader to check for this and remove it and I couldn't figure out why, but now I know. Hack was removed and now the unit test passes in only the sample names as per the contract.
2011-09-22 13:28:42 -04:00
Mauricio Carneiro
4e9020c9f7
Fixed alignment start for hard clipping insertions
2011-09-22 13:28:25 -04:00
Eric Banks
9c1728416c
Revert "Updating md5 for fixed file" because this was fixed properly in unstable (but will break SnpEff if put into Stable).
...
This reverts commit 6b4182c6ab3e214da4c73bc6f3687ac6d1c0b72c.
2011-09-22 13:16:42 -04:00
Eric Banks
888d8697b1
Merged bug fix from Stable into Unstable
2011-09-22 13:16:31 -04:00
Eric Banks
15a410b24b
Updating md5 for fixed file
2011-09-22 13:15:41 -04:00
Mark DePristo
ba5f83fee2
start of VariantContextUtils UnitTest
...
-- tests rsID merging
2011-09-22 12:10:39 -04:00
Mark DePristo
93dd1faa5f
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-22 11:20:10 -04:00
Mark DePristo
a05c959e5a
Empty unit tests for VariantContextUtils
...
-- will be expanded over the day
2011-09-22 11:20:07 -04:00
Mark DePristo
3fdee2b9ed
Merge from stable into unstable
2011-09-22 11:19:43 -04:00
Christopher Hartl
4f4a0fc38a
Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/chartl/dev/git
2011-09-22 11:01:58 -04:00
Christopher Hartl
982c47bfa7
Remove duplicate effort in ReadUtils (with apologies to Mauricio)
...
Big (but not major) cleanup of code in ILG - mostly excising the old likelihood model
Activated the early-abort check for ILG. I think it should be better this way.
2011-09-22 10:58:26 -04:00
Mark DePristo
c514df6d18
Merge of stable into unstable
2011-09-22 10:34:27 -04:00
Mark DePristo
f81a41b889
Updating MD5s for CombineVariants
...
-- Old version had broken RSIDs, new version is fixed. No longer see rs1234,. as it is now just rs1234
2011-09-22 10:30:25 -04:00
Eric Banks
b8ea9ceb68
Adding integration test that uses the -V:dbsnp binding to make sure it won't fail later on if someone messes with Tribble.
2011-09-21 22:43:31 -04:00
Eric Banks
8f8b59a932
My interpretation of the VCF spec is that the FORMAT field should only be present if there is genotype/sample data. So the VCFCodec now throws an exception when it encounters such a case. I had to fix one of the integration test VCFs.
2011-09-21 22:23:28 -04:00
Christopher Hartl
dc96f6da79
Merge branch 'master' of ssh://chartl@gsa2/humgen/gsa-scr1/chartl/dev/git
2011-09-21 18:18:41 -04:00
Christopher Hartl
f9cdc119af
Added a method to ReadUtils that converts reads of the form 10S20M10S to 40M (just unclips the soft-clips).
...
Be careful when using this - if you're writing a bam file it will be potentially written out of order (since the previous alignment start was at the M, not the S).
2011-09-21 18:16:42 -04:00
Christopher Hartl
faff6e4019
Failed to commit changes to the GATKReport required for more easy access when using the files as data sources (read: histograms) for walkers
2011-09-21 18:15:23 -04:00
Mauricio Carneiro
96768c8a18
Sending latest bug fixes to Reduce Reads to the main repository
2011-09-21 17:43:11 -04:00
Mauricio Carneiro
70335b2b0a
Hard clipping soft clipped reads to fix misalignments.
...
Pre-softclipped reads (with high qual) are a complicated event to deal with in the Reduced Reads environment. I chose to hard clip them out for now and added a todo item to bring them back on in the future, perhaps as a variant region.
2011-09-21 17:12:01 -04:00
Christopher Hartl
ef05827c7b
Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-21 16:40:47 -04:00
Christopher Hartl
3b51d9106a
Adding in likelihood calculations for mendelian violations. Also fixing a minor and rare bug in SelectVariants when specifying family structure on the command line.
2011-09-21 16:40:29 -04:00
Mark DePristo
04968c88b3
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-21 15:43:25 -04:00
Mark DePristo
6bcfce225f
Fix for dynamic type determination for bgzip files
...
-- GZipInputStream handles bgzip files under linux, but not mac
-- Added BlockCompressedInputStream test as well, which works properly on bgzip files
2011-09-21 15:39:19 -04:00
Mark DePristo
9f6f0c443c
Marginally cleaner isVCFStream() function
...
-- cleanup trying to debug minor bug. Failed to fix the bug, but the code is nicer now
2011-09-21 15:25:01 -04:00
Ryan Poplin
5fef6dc5d0
Merged bug fix from Stable into Unstable
2011-09-21 15:23:06 -04:00
Ryan Poplin
2585fc3d6c
Updating Rscript path doc text for Broad users
2011-09-21 15:22:26 -04:00
Mark DePristo
74f9ccf6dd
Merge
2011-09-21 11:30:11 -04:00
Mark DePristo
6592972f82
Putative fix for BAQ array out of bounds
...
-- Old code required qual to be <64, which isn't strictly necessary. Now uses the Picard SAMUtils.MAX_PHRED_SCORE constant
-- Unittest to enforce this behavior
2011-09-21 11:25:08 -04:00
Eric Banks
174859fc68
Don't allow whitespace in the INFO field
2011-09-21 11:14:54 -04:00
Mark DePristo
ecc7f34774
Putative fix for BAQ problem.
2011-09-21 11:09:54 -04:00
Mark DePristo
7d11f93b82
Final bugfix for CombineVariants
...
-- Now handles multiple records at a site, so that you don't see records like set=dbsnp-dbsnp-dbsnp when combining something with dbsnp
-- Proper handling of ids. If you are merging files with multiple ids for the same record, the ids are merged into a comma separated list
2011-09-21 10:58:32 -04:00
Mark DePristo
a91ac0c5db
Intermediate commit of bugfixes to CombineVariants
2011-09-21 10:15:05 -04:00
David Roazen
b04d8eab55
Merged bug fix from Stable into Unstable
2011-09-20 17:24:14 -04:00
Mauricio Carneiro
758ecf2d43
Bringing latest updates of ReduceReads to the master repository
2011-09-20 16:35:09 -04:00
David Roazen
d9ea764611
SnpEff annotator now adds OriginalSnpEffVersion and OriginalSnpEffCmd lines to the header of the VCF output file.
...
This change is urgently required for production, which is why it's going into Stable+Unstable
instead of just Unstable.
The keys for the SnpEff version and command header lines in the VCF file output by
VariantAnnotator (OriginalSnpEffVersion and OriginalSnpEffCmd) are intentionally
different from the keys for those same lines in the SnpEff output file (SnpEffVersion
and SnpEffCmd), so that output files from VariantAnnotator won't be confused
with output files from SnpEff itself.
2011-09-20 16:30:55 -04:00
Mark DePristo
bffd3cca6f
Bug fix for reduced read; only adds regular bases for calculation
...
-- No longer passes on deletions for genotyping
2011-09-20 15:07:06 -04:00
Mark DePristo
a1b4cafe7a
Bug fix for NPE when timer wasn't initialized
2011-09-20 13:59:59 -04:00
Mark DePristo
b7511c5ff3
Fixed long-standing bug in tribble index creation
...
-- Previously, on the fly indices didn't have dictionary set on the fly, so the GATK would read, add dictionary, and rewrite the index. This is now fixed, so that the on the fly index contains the reference dictionary when first written, avoiding the unnecessary read and write
-- Added a GenomeAnalysisEngine and Walker function called getMasterSequenceDictionary() that fetches the reference sequence dictionary. This can be used conveniently everywhere, and is what's written into the Tribble index
-- Refactored tribble index utilities from RMDTrackBuilder into IndexDictionaryUtils
-- VCFWriter now requires the master sequence dictionary
-- Updated walkers that create VCFWriters to provide the master sequence dictionary
2011-09-20 10:53:18 -04:00
Mark DePristo
230e16d7c0
Merge branch 'master' into rodrewrite
2011-09-20 06:54:18 -04:00
Mark DePristo
aa8afa3899
Merge
2011-09-19 21:16:47 -04:00
Mauricio Carneiro
56106d54ed
Changing ReadUtils behavior to comply with GenomeLocParser
...
Now the functions getRefCoordSoftUnclippedStart and getRefCoordSoftUnclippedEnd will return getUnclippedStart if the read is all contained within an insertion. Updated the contracts accordingly. This should give the same behavior as the GenomeLocParser now.
2011-09-19 14:00:00 -04:00
Mauricio Carneiro
080c957547
Fixing contracts for SoftUnclippedEnd utils
...
Now accepts reads that are entirely contained inside an insertion.
2011-09-19 13:53:53 -04:00
Mauricio Carneiro
5e832254a4
Fixing ReadAndInterval overlap comments.
2011-09-19 13:28:41 -04:00
Christopher Hartl
ecb8466662
Merged bug fix from Stable into Unstable
2011-09-19 12:32:08 -04:00
Christopher Hartl
8143def292
Fix the -T argument in the DepthOfCoverage docs
...
Add documentation for the RefSeqCodec, pointing users to the wiki page describing how to create the file
2011-09-19 12:31:47 -04:00
Christopher Hartl
034b868588
Revert "Fix the -T argument in the DepthOfCoverage docs"
...
This reverts commit 0994efda998cf3a41b1a43696dbc852a441d5316.
2011-09-19 12:16:07 -04:00
Mark DePristo
cfde0e674b
Merge branch 'sgintervals'
2011-09-19 12:02:41 -04:00
Mark DePristo
3e93f246f7
Support for sample sets in AssignSomaticStatus
...
-- Also cleaned up SampleUtils.getSamplesFromCommandLine() to return a set, not a list, and trim the sample names.
2011-09-19 11:40:45 -04:00
Mark DePristo
41ffb25b74
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-19 10:55:18 -04:00
Christopher Hartl
ca1b30e4a4
Fix the -T argument in the DepthOfCoverage docs
...
Add documentation for the RefSeqCodec, pointing users to the wiki page describing how to create the file
2011-09-19 10:29:06 -04:00
Mark DePristo
4ad330008d
Final intervals cleanup
...
-- No functional changes (my algorithm wouldn't work)
-- Major structural cleanup (returning more basic data structures that allow us to development new algorithm)
-- Unit tests for the efficiency of interval partitioning
2011-09-19 10:19:10 -04:00
Mark DePristo
6ea57bf036
Merge branch 'master' into sgintervals
2011-09-19 09:50:19 -04:00
Mark DePristo
6bd42c053d
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-18 20:18:39 -04:00
Roger Zurawicki
091c7197cd
Fixed memory leak and bug with deletions in clipping
...
The ClippingOp clip cigar function would run into a endless loop if the parameter were out of the reads range, I stopped the bug.
* There is no check to make sure the read coordinate are covered by the read though
When Hard clipping to interval, I added a check for deletions.
NOTE: method works for NA12878 WEx but needs to be more thoroughly tested/optimized
2011-09-18 19:21:51 -04:00
Guillermo del Angel
7fa1e237d9
Forgot to git stash pop new MD5's for CombineVariants integration test
2011-09-16 12:53:54 -04:00
Guillermo del Angel
e7b9a009b7
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-16 12:48:30 -04:00
Menachem Fromer
b2e8e11128
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-16 00:52:27 -04:00
Christopher Hartl
57b3efa2e2
Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-15 21:06:38 -04:00
Christopher Hartl
939babc820
Updating formating for ValidationAmplicons GATK docs
2011-09-15 21:05:51 -04:00
Christopher Hartl
9fdf1f8eb6
Fix some doc formatting for Depth of Coverage
2011-09-15 21:05:22 -04:00
Menachem Fromer
e6e9b08c9a
Must provide alleles VCF to UGCallVariants
2011-09-15 18:51:09 -04:00
David Roazen
d78e00e5b2
Renaming VariantAnnotator SnpEff keys
...
This is to head off potential confusion with the output from the SnpEff tool itself,
which also uses a key named EFF.
2011-09-15 17:42:15 -04:00
Eric Banks
1971fb35d7
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-15 16:55:33 -04:00
Eric Banks
9dc6354130
Oops didn't mean to touch this test before
2011-09-15 16:55:24 -04:00
Ryan Poplin
2a8b8efd2f
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-15 16:26:35 -04:00
Ryan Poplin
2f58fdb369
Adding expected output doc to CountCovariates
2011-09-15 16:26:11 -04:00
Eric Banks
fd1831b4a5
Updating docs to include more details
2011-09-15 16:25:03 -04:00
Eric Banks
6d02a34bfb
Updating docs to include output
2011-09-15 16:17:54 -04:00
Eric Banks
4ef6a4598c
Updating docs to include output
2011-09-15 16:10:34 -04:00
Eric Banks
fe474b77f8
Updating docs so printing looks nicer
2011-09-15 16:05:39 -04:00
Eric Banks
f04e51c6c2
Adding docs from Andrey since his repo was all screwed up.
2011-09-15 15:38:56 -04:00
Guillermo del Angel
86480b2e13
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-15 15:31:07 -04:00
Eric Banks
d369d10593
Adding documentation before the release for GATK wiki page
2011-09-15 13:56:23 -04:00
Eric Banks
202405b1a1
Updating the FunctionalClass stratification in VariantEval to handle the snpEff annotations; this change really needs to be in before the release so that the pipeline can output semi-meaningful plots. This commit maintains backwards compatibility with the crappy Genomic Annotator output. However, I did clean up the code a bit so that we now use an Enum instead of hard-coded values (so it's now much easier to change things if we choose to do so in the future). I do not see this as the final commit on this topic - I think we need to make some changes to the snpEff annotator to preferentially choose certain annotations within effect classes; Mark, let's chat about this for a bit when you get back next week. Also, for the record, I should be blamed for David's temporary commit the other day because I gave him the green light (since when do you care about backwards compatibility anyways?). In any case, at least now we have something that works for both the old and new annotations.
2011-09-15 13:52:31 -04:00
David Roazen
1e682deb26
Minor html-formatting-related documentation fix to the SnpEff class.
2011-09-15 13:07:50 -04:00
Guillermo del Angel
a942fa38ef
Refine the way we merge records in CombineVariants of different types. As of before, two records of different types were not combined and were kept separate. This is still the case, except when the alleles of one record are a strict subset of alleles of another record. For example, a SNP with alleles {A*,T} and a mixed record with alleles {A*,T, AAT} are now combined when start position matches.
2011-09-15 10:22:28 -04:00
David Roazen
3db457ed01
Revert "Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames"
...
After discussing this with Mark, it seems clear that the old version of the
VariantEval FunctionalClass stratification is preferable to this version.
By reverting, we maintain backwards compatibility with legacy output files
from the old GenomicAnnotator, and can add SnpEff support later without
breaking that backwards compatibility.
This reverts commit b44acd1abd9ab6eec37111a19fa797f9e2ca3326.
2011-09-14 10:47:28 -04:00
David Roazen
e0c8c0ddcb
Modified VariantEval FunctionalClass stratification to remove hardcoded GenomicAnnotator keynames
...
This is a temporary and hopefully short-lived solution. I've modified
the FunctionalClass stratification to stratify by effect impact as
defined by SnpEff annotations (high, moderate, and low impact) rather
than by the silent/missense/nonsense categories.
If we want to bring back the silent/missense/nonsense stratification,
we should probably take the approach of asking the SnpEff author
to add it as a feature to SnpEff rather than coding it ourselves,
since the whole point of moving to SnpEff was to outsource genomic
annotation.
2011-09-14 07:09:47 -04:00
David Roazen
1213b2f8c6
SnpEff 2.0.2 support
...
-Rewrote SnpEff support in VariantAnnotator to support the latest SnpEff release (version 2.0.2)
-Removed support for SnpEff 1.9.6 (and associated tribble codec)
-Will refuse to parse SnpEff output files produced by unsupported versions (or without a version tag)
-Correctly matches ref/alt alleles before annotating a record, unlike the previous version
-Correctly handles indels (again, unlike the previous version
2011-09-14 07:09:47 -04:00
Guillermo del Angel
5b1bf6e244
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-13 17:04:43 -04:00
Guillermo del Angel
c6672f2397
Intermediate (but necessary) fix for Beagle walkers: if a marker is absent in the Beagle output files, but present in the input vcf, there's no reason why it should be omitted in the output vcf. Rather, the vc is written as is from the input vcf
2011-09-13 16:57:37 -04:00
Mark DePristo
edf29d0616
Explicit info message about uploading S3 log
2011-09-12 22:16:52 -04:00
Mark DePristo
2316b6aad3
Trying to fix problems with S3 uploading behind firewalls
...
-- Cannot reproduce the very long waits reported by some users.
-- Fixed problem that exception might result in an undeleted file, which is now fixed with deleteOnExit()
2011-09-12 22:02:42 -04:00
Matt Hanna
64707c33bb
Merged bug fix from Stable into Unstable
2011-09-12 21:54:11 -04:00
Matt Hanna
e63d9d8f8e
Mauricio pointed out to me that dynamic merging the unmapped regions of multiple BAMs ('-L unmapped' with a BAM list)
...
was completely broken. Sorry about this! Fixed.
2011-09-12 21:50:59 -04:00
Eric Banks
ec4b30de6d
Patch from Laurent: typo leads to bad error messages.
2011-09-12 14:45:53 -04:00
David Roazen
9d9d438bc4
New VariantAnnotatorEngine capability: an initialize() method for all annotation classes.
...
All VariantAnnotator annotation classes may now have an (optional) initialize() method
that gets called by the VariantAnnotatorEngine ONCE before annotation starts.
As an example of how this can be used, the SnpEff annotation class will use the initialize()
method to check whether the SnpEff version number stored in the vcf header is a supported
version, and also to verify that its required RodBinding is present.
2011-09-12 13:00:53 -04:00
Ryan Poplin
981b78ea50
Changing the VQSR command line syntax back to the parsed tags approach. This cleans up the code and makes sure we won't be parsing the same rod file multiple times. I've tried to update the appropriate qscripts.
2011-09-12 12:17:43 -04:00
Ryan Poplin
60ebe68aff
Fixing issue in VariantEval in which insertion and deletion events weren't treated symmetrically. Added new option to require strict allele matching.
2011-09-12 09:43:23 -04:00
Guillermo del Angel
9344938360
Uncomment code to add deleted bases covering an indel to per-sample genotype reporting, update integration tests accordingly
2011-09-10 19:41:01 -04:00
Guillermo del Angel
b399424a9c
Fix integration test affected by non-calling all-zero PL samples, and add a more complicated multi-sample integration test from a phase 1 case, GBR with mixed technologies and complex input alleles
2011-09-09 20:44:47 -04:00
Guillermo del Angel
e95d484757
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-09 18:31:14 -04:00
Guillermo del Angel
a807205fc3
a) Minor optimization to softMax() computation to avoid redundant operations, results in about 5-10% increase in speed in indel calling.
...
b) Added (but left commented out since it may affect integration tests and to isolate commits) fix to per-sample DP reporting, so that deletions are included in count.
c) Bug fix to avoid having non-reference genotypes assigned to samples with PL=0,0,0. Correct behavior should be to no-call these samples, and to ignore these samples when computing AC distribution since their likelihoods are not informative.
2011-09-09 18:00:23 -04:00
Mauricio Carneiro
9e650dfc17
Fixing SelectVariants documentation
...
getting rid of messages telling users to go for the YAML file. The idea is to not support these anymore.
2011-09-09 16:25:31 -04:00
Mark DePristo
72536e5d6d
Done
2011-09-09 15:44:47 -04:00
Mark DePristo
3c8445b934
Performance bugfix for GenomeLoc.hashcode
...
-- old version overflowed so most GenomeLocs had 0 hashcode. Now uses or not plus to combine
2011-09-09 14:25:37 -04:00
Mark DePristo
c6436ee5f0
Whitespace cleanup
2011-09-09 14:24:29 -04:00
Mark DePristo
87dc5cfb24
Whitespace cleanup
2011-09-09 14:23:42 -04:00
Ryan Poplin
1953edcd2d
updating Validate Variants deletion integration test
2011-09-09 13:39:08 -04:00
Ryan Poplin
9ada9b3ed4
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-09 13:15:36 -04:00
Ryan Poplin
354529bff3
adding Validate Variants integration test with a deletion
2011-09-09 13:15:24 -04:00
Ryan Poplin
91c949db74
Fixing ValidateVariants so that it validates deletion records. Fixing GATKdocs.
2011-09-09 12:57:14 -04:00
Mark DePristo
06cb20f2a5
Intermediate commit cleaning up scatter intervals
...
-- Adding unit tests to ensure uniformity of intervals
2011-09-09 12:56:45 -04:00
Eric Banks
51eb95d638
Missed these tests before
2011-09-09 11:46:37 -04:00
Eric Banks
6ad8943ca0
CompOverlap no longer keeps track of the number of comp sites since it wasn't (and cannot) keeping track of them correctly.
2011-09-09 09:45:24 -04:00
Mark DePristo
507574b1c8
Merge branch 'cancer'
2011-09-08 16:10:02 -04:00
Mark DePristo
48461b34af
Added TYPE argument to print out VariantType
2011-09-08 15:01:13 -04:00
Eric Banks
eaaba6eb51
Confirming that when stratifying by sample in VE the monomorphic sites for a given sample are not counted for the relevant metrics. Adding integration test to cover it.
2011-09-08 13:17:34 -04:00
Ryan Poplin
2636d216de
Adding indel vqsr integration test
2011-09-08 10:38:13 -04:00
Ryan Poplin
9cba1019c8
Another fix for genotype given alleles for indels. Expanding the indel integration tests to include multiallelics and indel records that overlap
2011-09-08 09:25:13 -04:00
Ryan Poplin
e0020b2b29
Fixing PrintRODs. Now has input and only prints out one copy of each record
2011-09-08 08:58:37 -04:00
Ryan Poplin
29c968ab60
clean up
2011-09-08 08:42:43 -04:00
Ryan Poplin
59841f8232
Fixing genotype given alleles for indels. Only take the records that start at this locus.
2011-09-08 08:41:16 -04:00
Mark DePristo
cd2c511c4a
GCF improvements
...
-- Support for streaming VCF writing via the VCFWriter interface
-- GCF now has a header and a footer. The header is minimal, and contains a forward pointer to the position of the footer in the file.
-- Readers now read the header, and then jump to the footer to get the rest of the "header" information
-- Version now a field in GCF
2011-09-07 23:28:46 -04:00
Mark DePristo
fe5724b6ea
Refactored indexing part of StandardVCFWriter into superclass
...
-- Now other implementations of the VCFWriter can easily share common functions, such as writing an index on the fly
2011-09-07 23:27:08 -04:00
Mark DePristo
01b6177ce1
Renaming GVCF -> GCF
2011-09-07 17:10:56 -04:00
Mark DePristo
b220ed0d75
Merge branch 'master' into rodrewrite
2011-09-07 17:05:35 -04:00
Guillermo del Angel
45d54f6258
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-07 16:49:49 -04:00
Guillermo del Angel
9604fb2ba3
Necessary but not sufficient step to fix GenotypeGivenAlleles mode in UG which is now busted
2011-09-07 16:49:16 -04:00
Mark DePristo
2ded027762
Removed dysfunctional tranches support from VariantEval
2011-09-07 16:09:24 -04:00
Eric Banks
aa9e32f2f1
Reverting Mark's previous commit as per the open discussion. Now the eval modules check isPolymorphic() before accruing stats when appropriate. Fixed the IndelLengthHistogram module not to error out if the indel isn't simple (that would have been bad). Only integration test that needed to be updated was the tranches one based on a separate commit from Mark.
2011-09-07 15:48:06 -04:00
Mark DePristo
d7e355b4b6
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-07 14:54:16 -04:00
Mark DePristo
9127849f5d
BugFix for unit test
2011-09-07 14:54:10 -04:00
Eric Banks
3a04955a30
We already had isPolymorphic and isMonomorphic in the VariantContext, but the implementation was incorrect for many edge cases (e.g. sites-only files, sites with samples who were no-called). Fixing. Moving on to VE now.
2011-09-07 14:01:42 -04:00
Guillermo del Angel
743bf7784c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-07 13:21:26 -04:00
Guillermo del Angel
5f22ef9a8c
Added missing javadoc info to Beagle arguments
2011-09-07 13:21:11 -04:00
Mark DePristo
3bcbfa6e06
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-09-07 13:13:17 -04:00
Mark DePristo
430da23446
At least 2 minutes must pass before a status message is printed, further stabilizing time estimates
2011-09-07 13:13:07 -04:00
Mauricio Carneiro
6857d0324e
Merge branch 'master' into rr
2011-09-07 12:59:08 -04:00
Mark DePristo
7e9e20fed0
Forgot to delete previous call
2011-09-07 12:54:52 -04:00
Mark DePristo
d23d620494
Pushing traversal engine timer start to as close to actual start as possible
...
-- Should make initial timings more accurate
2011-09-07 12:52:33 -04:00
Mark DePristo
6ff432e1f2
BugFix for TF argument to VariantEval, actually making it work properly
2011-09-07 12:50:17 -04:00
Mauricio Carneiro
131cb7effd
Bringing Reduce Reads bug fixes to the main repository
2011-09-07 12:25:53 -04:00
Mark DePristo
a1920397e8
Major bugfix for per sample VariantEval
...
-- per sample stratification was not being calculated correctly. The alt allele was always remaining, even if the genotype of the sample was hom-ref. Although conceptually fine, this breaks the assumptions of all of the eval modules, so per sample stratifications actually included all variants for everything. Eric is going to fix the system in general, so this commit may break the build.
2011-09-07 12:18:11 -04:00
Mark DePristo
a02636a1ac
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/ebanks/Sting_rodrefactor into rodrewrite
2011-09-07 10:50:00 -04:00
Mark DePristo
d5641cfac5
Merge branch 'variantEvalST'
2011-09-07 10:44:23 -04:00
Mark DePristo
2f4cf82e3b
VariantEval cleanup. Added VariantType Stratification
...
-- ArrayList are List where possible
-- states refactored into VariantStratifier base class (reduces many lines of duplicate code)
-- Added VariantType stratification that partitions report by VariantContext.Type
2011-09-07 10:43:53 -04:00
Christopher Hartl
436f6eb52b
Reverting Eric's change and pushing in some command-line-option documentation.
2011-09-07 08:53:30 -04:00
Eric Banks
1ef8a1750a
I asked nicely and got nothing. Then I threatened and still got nothing. So I am carrying through on my threats. Guillermo, you have a short reprieve because you were away on vacation, but let's get yours done tomorrow afternoon.
2011-09-06 21:07:49 -04:00
Eric Banks
da9c8ab386
Revving the Tribble jar where the DbsnpCodec class was renamed to OldDbsnpCodec. Updating GATK code accordingly.
2011-09-06 20:39:42 -04:00
Mark DePristo
3db7ecb920
ReducedRead flag cached in GATKSAMRecord. 20% performance improvement
2011-09-06 15:11:38 -04:00
Roger Zurawicki
47607a7eff
Fixed bug where deletions messed up interval clipping
...
- Instead of using readLength, the ReadUtil function are used to get a proper read coordinate
- Added debug info in interval clipping ( with -dl)
NOTE: method might not be safe for production and checks need to be added to the ClippingOp code
2011-09-06 14:25:57 -04:00
Khalid Shakir
0adb388dee
Fixed bug in SelectVariants that was annotating sample_file / exclude_sample_file as @Argument instead of @Input meaning they weren't tracked in Queue.
...
Updates for HybridSelectionPipeline:
- Use VQSR on SNPs for projects using bait set whole_exome_agilent_1 and applying cut at 98.5.
- If a whole_exome_agilent_1 project has less than 50 samples also mixing in 1000G samples to reach VQSR thresholds.
- Updated SNP hard filters based on analysis done with ebanks to approximate VQSR results on small target batches.
- Removed GSA_PRODUCTION_ONLY flag from indel caller.
- Updated indel hard filters based on delangel's analysis.
- Updated HybridSelectionPipelineTest to use HARD SNP filters only, for now.
2011-09-06 12:41:46 -04:00
Mark DePristo
d471617c65
GATK binary VCF (gvcf) prototype format for efficiency testing
...
-- Very minimal working version that can read / write binary VCFs with genotypes
-- Already 10x faster for sites, 5x for fully parsed genotypes, and 1000x for skipping genotypes when reading
2011-09-02 21:15:19 -04:00
Mark DePristo
048202d18e
Bugfix for cached quals
2011-09-02 21:13:28 -04:00
Mark DePristo
03aa04e37c
Simple refactoring to make formating functions public
2011-09-02 21:13:08 -04:00
Mark DePristo
124ef6c483
MISSING_VALUE now gets defaultValue in getAttribute functions
2011-09-02 21:12:28 -04:00
Mark DePristo
82f2131777
Simplied getAttributeAsX interfaces
...
-- Removed versions getAttribriteAsX(key) that except on not having the value.
-- Removed version that getAttributeAsXNoException(key)
-- The only available assessors are now getAttributeAsX(key, default).
-- This single accessors properly handle their argument types, so if the value is a double it is returned directly for getAttributeAsDouble(), or if it's a string it's converted to a double. If the key isn't found, default is returned.
2011-09-02 12:27:11 -04:00
Mauricio Carneiro
08ae6c0c61
ReadClipper is now handling unmapped reads
2011-09-02 11:32:30 -04:00
Mark DePristo
c57198a1b9
Optimizations in VCFCodec
...
-- Don't create an empty LinkedHashSet() for PASS fields. Just return Collections.emptySet() instead.
-- For filter fields with actual values, returns an unmodifiableSet instead of one that can be changed
2011-09-02 08:46:17 -04:00
Mark DePristo
c3ea96d856
Removing many unused functions of unquestionable purpose
2011-09-02 08:42:01 -04:00
Eric Banks
d241f0e903
Adding docs for the pcr error rate argument.
2011-09-01 21:57:02 -04:00
Eric Banks
827fe6130c
Adding hidden printing option. Also, always run UG in mode GENOTYPE_GIVEN_ALLELES given that we don't actually test for the correct alleles (otherwise UG may choose a different allele and we may falsely validate the wrong one).
2011-09-01 11:40:35 -04:00
Mark DePristo
1aa4b12ff0
Reduced the number of combinations being tested here, which was overkill
2011-09-01 10:42:43 -04:00
Mark DePristo
ac49b8d26b
Conditional support for PerformanceTrackingQuerySource to measure Tribble / GATK bridge performance
...
-- Removed DEBUG option, instead use MEASURE_TRIBBLE_QUERY_PERFORMANCE in RMDTrackerBuilder
2011-09-01 10:41:55 -04:00
Mauricio Carneiro
4b5a7046c5
Making ReadLengthDistribution Public
...
Found this neat little walker Kiran wrote stashed in the private tree. Very useful. Generalized it a bit, added GATKDocs and moved it to public. I might include it as a QC step on the pacbio processing pipeline.
* generalize it so it works with non pair ended reads.
* generalize it to work with no read group information
2011-08-31 15:52:28 -04:00
Mauricio Carneiro
7d79de91c5
Merge branch 'master' into rr
2011-08-30 02:50:19 -04:00
Mauricio Carneiro
0cd9438ac2
fixed soft unclipped calculation
...
* getRefCoordSoftUnclippedEnd was not resetting the shift when hitting insertions. Fixed.
* getReadCoordinateForReferenceCoordinateBeforeAlignmentEnd was returning the wrong read coordinate position. Fixed.
2011-08-30 02:45:29 -04:00
Mauricio Carneiro
fd540592ab
Added RMS calculation for consensus MQ
...
Consensus MQ is now the average of the RMS of the mapping qualities of the reads making each site.
2011-08-30 02:45:20 -04:00
Mauricio Carneiro
6f9264d2b3
Hard Clipping no longer leaves indels on the tails
...
The clipper could leave an insertion or deletion as the start or end of a read after hardclipping a read if the element adjacent to the clipping point was an indel. Fixed.
2011-08-30 02:44:58 -04:00
Mauricio Carneiro
943876c6eb
Added QUAL/MINVAR parameters to the walker
2011-08-30 02:44:46 -04:00
Mauricio Carneiro
7532be7f5a
Allowing to clip after AlignmentEnd if end is soft clipped.
...
Read clipper now identifies and clips even if the requested coordinate is outside the alignment but the read contains soft clipped bases in that region.
2011-08-30 02:44:46 -04:00
Mauricio Carneiro
90a1f5e15c
Several bug fixes
...
* When hard clipping a read that had insertions in it, the insertion was being added to the cigar string's hard clip element. This way, the old UnclippedStart() was being modified and so was the calculation of the new AlignmentStart(). Fixed it by subtracting the number of insertions clipped from the total number of hard clipped bases.
* Walker was sending read instead of filtered read when deleting a read that contains only Q2 bases
* Sliding the window was causing reads that started on the new start position to be entirely clipped.
2011-08-30 02:44:19 -04:00
Mauricio Carneiro
66a8b36cf5
Fixed most indexing bugs
...
* added bases and quals to consensus
* fixed consensus read cigar generation.
2011-08-30 02:43:41 -04:00
Mark DePristo
1e5001b447
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-29 17:04:21 -04:00
Mark DePristo
3af001fff2
Bugfix for file that must not exist on disk
2011-08-29 17:00:10 -04:00
Mark DePristo
3b09d42ed6
Now only prints 1 warning message about duplicate headers in simpleMerge
2011-08-29 14:41:29 -04:00
Eric Banks
c2f0db969b
Don't use the default deletion value from UG if not asking to have it set
2011-08-29 13:48:10 -04:00
Eric Banks
bb7a37e8f2
We need to allow reference calls in the input VCF for the GenotypeAndValidate walker when using the BAM as truth so that we can test supposed monomorphic calls against the truth.
2011-08-29 13:19:35 -04:00
Ryan Poplin
bc252a0d62
misc minor bug fixes in assembly. Increasing the minimum number of bad variants to be used in negative model training in the VQSR
2011-08-29 08:11:31 -04:00
Mark DePristo
a5c65fc133
Debugging information to print out the Query tracks
2011-08-28 18:54:49 -04:00
Mark DePristo
7bf006278d
Moved ResolveHostname to general utils as a static function
2011-08-28 12:04:16 -04:00
Mark DePristo
ccec0b4d73
AnalyzeCovariates uses the general RScript system now
...
-- Convenience constructor for collection for testing
-- callRScript() now accepts Objects not Strings, for convenience
2011-08-27 12:54:13 -04:00
Mark DePristo
1ceb020fae
UnitTests for RScript
2011-08-27 10:50:05 -04:00
Mark DePristo
e37a638e09
Fix for disallowed characters in GATKReportTable
...
-- Illegal characters are automatically replaced with _
2011-08-26 13:24:06 -04:00
Mark DePristo
c0503283df
Spelling fix requires md5 updates
2011-08-26 07:40:44 -04:00
Mark DePristo
eef1ac415a
Merge branch 'master' into rodTesting
...
Conflicts:
public/java/src/org/broadinstitute/sting/gatk/walkers/variantutils/VariantsToTable.java
2011-08-26 00:35:41 -04:00
Eric Banks
9b7512fd94
Just because there's a ref base doesn't mean the VC needs to be padded
2011-08-25 22:42:14 -04:00
Mark DePristo
e01273ca7c
Queue now writes out queueJobReport.pdf
...
-- General purpose RScript executor in java (please use when invoking RScripts)
-- Removed groupName. This is now analysisName
-- Explicitly added capability to enable/disable individual QFunction
2011-08-25 16:57:11 -04:00
Eric Banks
09a729da3a
Removing incorrect comment
2011-08-25 15:42:52 -04:00
Eric Banks
8bbef79fc2
Create clipped alleles during allele parsing instead of creating a full VC, clipping alleles, and regenerating the VC from scratch.
2011-08-25 15:37:26 -04:00
Ryan Poplin
29c7b10f7b
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-24 15:18:58 -04:00
Ryan Poplin
e5008aba00
Output the top two haplotypes as a variant call by running smith-waterman alignment against the reference and calling any difference as variation. This is the first verion that runs end-to-end by taking in reads as bam file and writing out variant calls in VCF.
2011-08-24 15:18:44 -04:00
Guillermo del Angel
e618cb1e79
a) Renamed/expanded SelectVariants arguments that choose particular kinds of variants and particular allelic types, now instead of -Indels or -SNPs we can specify for example -selectType [MIXED|INDEL|SNP|MNP|SYMBOLIC]. To select biallelic, multiallelic variants, use -restrictAllelesTo [BIALLELIC|MULTIALLELIC]. Corresponding gatkdocs changes.
...
b) More useful AC,AF logging in VariantsToTable with multiallelic sites: instead of logging comma-separated values, log max value by default. Hidden, experimental argument -logACSum to log sum of ACs instead. This is due to extreme slowness of R in parsing strings to tokens and computing max/sum itself (~100x slower than gatk).
c) Added integrationtest for new SelectVariants commands
2011-08-24 12:25:50 -04:00
Mark DePristo
28ee6dac41
Fixed spelling mistake
2011-08-24 10:14:45 -04:00
Ryan Poplin
f37875600a
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-24 09:02:44 -04:00
Khalid Shakir
1ecbf05aae
Avoid segfaults due to out of date and possibly abandonded LSF DRMAA implementation when use'ing LSF instead of .combined_LSF_SGE
2011-08-23 23:49:36 -04:00
Mark DePristo
569e1a1089
Walker.isDone() aborts execution early
...
-- Useful if you want to have a parameter like MAX_RECORDS that wants the walker to stop after some number of map calls without having to resort to the old System.exit() call directly.
2011-08-23 16:53:06 -04:00
Ryan Poplin
a1a1fac9e4
Likelihood engine now gives non-zero likelihoods. Using HMM function that can handle context specific gap open and gap continuation penalties
2011-08-23 13:43:07 -04:00
Guillermo del Angel
6e2552a9ef
Merge fix
2011-08-23 12:40:43 -04:00
Guillermo del Angel
8b7a0b3b62
Two new arguments to SelectVariants to exclude either multiallelic or biallelic sites from input vcf
2011-08-23 12:40:01 -04:00
Roger Zurawicki
ac36271457
Fixed extra reads showing up in Variable Sites
...
Reads that were not hard clipped for the variable site no longer show up in output file
Walker now uses unclippedStart of Read to determine position in the sliding Window
2011-08-23 11:26:00 -04:00
Mark DePristo
6d6feb5540
Better error message when you cannot determine a ROD type because the file doesn't exist or cannot be read
2011-08-23 10:56:37 -04:00
Mauricio Carneiro
feeab6075f
Merging ReduceReads development with unstable repo
...
It is time to bring the ReadClipper class to the main repo. Read Clipper has tested functionality for soft and hard clipping reads. I will prepare thorough documentation for it as it will be very useful for the assembler and the GATK in general.
2011-08-22 23:03:03 -04:00
Guillermo del Angel
ee68713267
Further Bug fixes to CountVariants: stratifications were wrong in case genotypes had no-calls, for example if we stratified by sample and a sample had a no-call, this no-call was considered a true variant and counts were incorrectly increased
2011-08-22 20:42:47 -04:00
Guillermo del Angel
c270384b2e
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-22 20:39:32 -04:00
Guillermo del Angel
8ae24912f4
a) Misc fixes in Phase1 indel vqsr script,
...
b) More R-friendly VariantsToTable printing of AC in case of multiple alt alleles
c) Rename FixPLOrderingWalker to FixGenotypesWalker and rewrote: no longer need older code, replaced with code to replace genotypes with all-zero PL's with a no-call.
2011-08-22 20:39:06 -04:00
Mark DePristo
85c5a6f890
Merge branch 'rodTesting'
...
Conflicts:
private/java/src/org/broadinstitute/sting/gatk/walkers/performance/ProfileRodSystem.java
2011-08-22 17:43:47 -04:00
Mark DePristo
1eab9be35d
Now with accurate javadoc
2011-08-22 17:25:15 -04:00
Mark DePristo
3612a3501d
info, not warn, about dynamic type determination
2011-08-22 17:24:51 -04:00
Eric Banks
dc42571dd9
Only create the genotype map when necessary
2011-08-22 15:40:36 -04:00
Khalid Shakir
c4c90c8826
Updates to JobRunners from the Queue developer community and from running the WholeGenomePipeline:
...
- Ability to pass a different resident memory reservation and limits. Useful for large pileups of low pass genome data that sometimes need high -Xmx6g but usually don't exceed 2-3g in actual heap size.
- Fixed jobPriority to work for all job runners. Now must be a integer between 0 and 100- even for GridEngine- and will be mapped to the correct values.
- Passing parallel environment and job resource requests to LSF and GridEngine. Useful for passing tokens like iodine_io=1 and -pe pe_slots 8
- Refactored GridEngine JobRunner to also provide basic support for other job dispatchers with DRMAA implementations such as Torque/PBS. Should work for basic running but advanced users must pass their own jobNativeArgs from the command line or in customized QScripts until someone maps properties like jobQueue, jobPriority, residentRequest, etc. into a Torque/PBS/etc. dispatcher.
2011-08-22 15:13:27 -04:00
Eric Banks
2c24b68a96
Working implementation of DecodeLoc for VCF parsing. Makes indexing 3x faster.
2011-08-22 15:11:21 -04:00
Eric Banks
518b3dd291
Don't let the genotypes map be null
2011-08-22 15:10:30 -04:00
Ryan Poplin
f93a554b01
updating exome specific parameters in MDCP
2011-08-21 10:25:36 -04:00
Ryan Poplin
dbff84c54e
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-21 10:09:19 -04:00
Khalid Shakir
22ca44c015
Fixed Queue's tagging of RodBindings.
...
Fixed argument definition names.
2011-08-21 02:34:20 -04:00
Eric Banks
a8cbced71b
Bug fix for Ryan: check for no context
2011-08-20 22:49:51 -04:00
Eric Banks
0ccd173967
Fixing the recent SelectVariants fix
2011-08-20 21:30:08 -04:00
Ryan Poplin
b008676878
fixing the previous fix
2011-08-20 21:21:55 -04:00
Guillermo del Angel
782453235a
Updated VariantEvalIntegrationTest since there's a new column separating nMixed and nComplex in CountVariants
...
Misc updates to WholeGenomeIndelCalling.scala
Bug fix in VariantEval (may be temporary, need more investigation): if -disc option is used in sites-only vcf's then a null pointer exception is produced, caused by recent introduction of -xl_sf options.
2011-08-20 12:24:22 -04:00
Ryan Poplin
539e157ecd
Fixing misc parameters in MDCP. The pipeline now does VariantEval of output by default. Fix for NaN vqslod values in VQSR
2011-08-20 11:28:48 -04:00
Guillermo del Angel
4939648fd4
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-20 08:50:43 -04:00
Ryan Poplin
a96ecbab71
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-19 19:30:05 -04:00
Ryan Poplin
ddb5045e14
Updating the methods development calling pipeline for the new rod binding syntax and the new best practices.
2011-08-19 19:29:51 -04:00
Mark DePristo
ff018c7964
Swapped argument order but not MD5 order
2011-08-19 16:55:56 -04:00
Mark DePristo
8b3cfb2f1c
Final documented version of GATKDoclet and associated classes
...
-- Docs on everything.
-- Feature complete. At this point only minor improvements and bugfixes are anticipated
2011-08-19 16:52:17 -04:00
Mark DePristo
b08d63a6b8
Documentation and code cleanup for ClipReads, CallableLoci, and VariantsToTable
...
-- Swapped -o [summary] and -ob [bam] for more standard -o [bam] and -os [summary] arguments.
-- @Advanced arguments
2011-08-19 15:06:37 -04:00
Mark DePristo
49e831a13b
Should have checked in
2011-08-19 14:35:16 -04:00
Mauricio Carneiro
7b5fa4486d
GenotypeAndValidate - Added docs to the @Arguments
2011-08-19 13:35:11 -04:00
Mark DePristo
9f7d4beb89
Merge branch 'help'
2011-08-19 13:14:02 -04:00
Mark DePristo
4d1fd17a97
GATKDoclet cleanup and documentation
...
-- Fixed bug in the way ArgumentCollections were handled that lead to failure in handling the dbsnp argument collection.
2011-08-19 13:13:41 -04:00
Ryan Poplin
0f25167efd
minor fix in VariantEval docs
2011-08-19 11:01:04 -04:00
Mark DePristo
198955f752
GATKDoc descriptions for all standard codecs, or TODO for their owners
...
-- Also added vcf.gz support in the VCF codec. This wasn't committed in the last round, because it was missed by the parallel documentation effort.
2011-08-19 09:57:21 -04:00
Guillermo del Angel
269ed1206c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-19 09:32:20 -04:00
Mark DePristo
a5e279d697
Dynamic typing of vcf.gz files
...
-- CombineVariantsIntegrationTests now use dynamic typing of vcf.gz files
-- FeatureManagerUnitTests tests for correctness.
2011-08-19 09:05:11 -04:00
Eric Banks
40e67cff1b
I like the @Advanced annotation
2011-08-18 22:27:34 -04:00
Mark DePristo
2457c7b8f5
Merge branch 'master' into help
2011-08-18 22:20:43 -04:00
Mark DePristo
5fbdf968f7
ArgumentSource no longer comparable. Arguments sorted by GATKDoclet
2011-08-18 22:20:14 -04:00
Eric Banks
77fa2c1546
Renaming read filters with a superfluous 'Read' in their names. Kept the ones that made sense to have it (e.g. MalformedReadFilter).
2011-08-18 22:01:33 -04:00
Mark DePristo
1d3799ddf7
Merge branch 'master' into help
2011-08-18 22:00:29 -04:00
Mark DePristo
d1892cd0d7
Bug fixes
...
-- Sorting of ArgumentSources now done in GATKDoclet, not in the ParsingEngine, as the system depends on the LinkedTreeMap
-- Fixed broken exception throwing in the case where a file's type could not be determined
2011-08-18 21:58:36 -04:00
Mark DePristo
c5efb6f40e
Usability improvements to GATKDocs
...
-- ArgumentSources are now sorted by case insensitive names, so arguments are shown in alphabetical order (Ryan)
-- @Advanced annotation can be used to indicate that an argument is an advanced option and should be visually deemphasized in the GATKs. There's now an advanced section. Mauricio or Ryan -- could you figure out how to make this section less prominent in the style.css?
2011-08-18 21:39:11 -04:00
Mark DePristo
d94da0b1cf
Moved CG and SOAP codecs to private
2011-08-18 21:20:26 -04:00
Mark DePristo
f7414e39bc
Improvements to GATKDocs
...
-- Allowed values for RodBinding<T> are displayed in the GATKDocs
-- Longest name up to 30 characters is chosen for main argument list (suggested by Ryan/Mauricio)
-- Features are listed in alphabetical order
-- Moved useful getParameterizedType() function to JVMUtils
-- Tests of these features in the Documentation Test
2011-08-18 21:20:09 -04:00
Ryan Poplin
09d099cada
Added GATKDocs to the UnifiedGenotyper.
2011-08-18 20:57:02 -04:00
Mauricio Carneiro
6ef01e40b8
Complete rewrite of Hard Clipping (ReadClipper)
...
Hard clipping is now completely independent from softclipping and plows through previously hard or soft clipped reads.
2011-08-18 18:35:45 -04:00
Guillermo del Angel
626cbf9411
Bug fixes and cleanups for IndelStatistics
2011-08-18 16:28:40 -04:00
Guillermo del Angel
58560a6d50
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-18 16:17:52 -04:00
Guillermo del Angel
3dfb60a46e
Fixing up and refactoring usage of indel categories. On a variant context, isInsertion() and isDeletion() are now removed because behavior before was wrong in case of multiallelic sites. Now, methods isSimpleInsertion() and isSimpleDeletion() will return true only if sites are biallelic. For multiallelic sites, isComplex() will return true in all cases.
...
VariantEval module CountVariants is corrected and an additional column is added so that we log mixed events and complex indels separately (before they were being conflated).
VariantEval module IndelStatistics is considerably simplified as the sample stratification was wrong and redundant, now it should work with the VE-generic Sample stratification. Several columns are renamed or removed since they're not really useful
2011-08-18 16:17:38 -04:00
Chris Hartl
6b256a8ac5
Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/chartl/dev/git
2011-08-18 15:29:24 -04:00
Chris Hartl
a8935c99fc
dding docs for DepthOfCoverage and ValidationAmplicons
2011-08-18 15:28:35 -04:00
Mark DePristo
f2f51e35e3
Merge branch 'master' into help
2011-08-18 14:05:33 -04:00
Mark DePristo
faa3f8b6f6
Only concrete classes are now documented
2011-08-18 14:04:47 -04:00
Ryan Poplin
7c4ce6d969
Added GATKDocs for the VQSR walkers.
2011-08-18 14:00:39 -04:00
Mark DePristo
5772766dd5
Improvements to GATKDocs
...
-- Now supports a static list of root classes / interfaces that should receive docs. A complementary approach to documenting features to the DocumentedGATKFeature annotation
-- Tribble codecs are now documented!
-- No longer displayed sub and super classes
2011-08-18 14:00:09 -04:00
Mark DePristo
e03db30ca0
New uses DocumentedGATKFeatureObject instead of annotation directly
...
-- Step 1 on the way to creating a static list of additional classes that we want to document.
2011-08-18 12:31:04 -04:00
Mark DePristo
d4511807ed
Merge branch 'master' into help
2011-08-18 11:53:37 -04:00
Mark DePristo
c787fd0b70
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-18 11:52:45 -04:00
Mark DePristo
c797616c65
If you have one sample in your BAM, getToolkit().getSamples().size() == 2
...
Also deleted double initializationm, where a line of code was duplicated in creating the GATK engine.
2011-08-18 11:51:53 -04:00
Mark DePristo
cbec69a130
Merge branch 'master' into help
...
Conflicts:
public/java/src/org/broadinstitute/sting/utils/help/HelpUtils.java
2011-08-18 11:33:27 -04:00
Eric Banks
aa21fc7c9c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-18 11:30:59 -04:00
Mark DePristo
f5d7cabb20
Fix for reintroducing an already solved problem.
2011-08-18 11:20:12 -04:00
Eric Banks
a45498150a
Remove non-ascii char
2011-08-18 11:18:29 -04:00
Ryan Poplin
c08a9964d4
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-18 10:58:04 -04:00
Ryan Poplin
bb79d3edae
Added GATKDocs for the BQSR walkers.
2011-08-18 10:57:48 -04:00
Mark DePristo
47bbddb724
Now provides type-specific user feedback
...
For RodBinding<VariantContext> error messages now list only the Tribble types that produce VariantContexts
2011-08-18 10:47:16 -04:00
Mark DePristo
2d41ba15a4
Vastly better Tribble help message
...
Here's a new example:
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 1.1-520-g76495cd):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: Invalid command line: Failed to parse value /humgen/gsa-hpprojects/GATK/data/refGene_b37.filtered.sorted.txt for argument refSeqRodBinding. Message: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :TYPE listing the correct type from among the supported types:
##### ERROR Name FeatureType Documentation
##### ERROR BEAGLE BeagleFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_beagle_BeagleCodec.html
##### ERROR BED BEDFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_bed_BEDCodec.html
##### ERROR BEDTABLE TableFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_table_BedTableCodec.html
##### ERROR CGVAR VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_completegenomics_CGVarCodec.html
##### ERROR DBSNP DbSNPFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_dbsnp_DbSNPCodec.html
##### ERROR GELITEXT GeliTextFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broad_tribble_gelitext_GeliTextCodec.html
##### ERROR MAF MafFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_gatk_features_maf_MafCodec.html
##### ERROR MILLSDEVINE VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_MillsDevineCodec.html
##### ERROR RAWHAPMAP RawHapMapFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_hapmap_RawHapMapCodec.html
##### ERROR REFSEQ RefSeqFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_refseq_RefSeqCodec.html
##### ERROR SAMPILEUP SAMPileupFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_sampileup_SAMPileupCodec.html
##### ERROR SAMREAD SAMReadFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_samread_SAMReadCodec.html
##### ERROR SNPEFF SnpEffFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_snpEff_SnpEffCodec.html
##### ERROR SOAPSNP VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_soapsnp_SoapSNPCodec.html
##### ERROR TABLE TableFeature http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_table_TableCodec.html
##### ERROR VCF VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_vcf_VCFCodec.html
##### ERROR VCF3 VariantContext http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_utils_codecs_vcf_VCF3Codec.html
##### ERROR ------------------------------------------------------------------------------------------
2011-08-18 10:31:32 -04:00
Mark DePristo
c2287c93d7
Cleanup of codec locations. No more dbSNPHelper
...
-- refdata/features now in utils/codecs with the other codecs
-- Deleted dbsnpHelper. rsID function now in VCFutils. Remaining code either deleted or put into VariantContextAdaptors
-- Many associated import updates due to code move
2011-08-18 10:02:46 -04:00
Mark DePristo
9c17d54cb6
getFeatureClass() now returns Class<T> not Class to avoid yesterday's runtime error
2011-08-18 09:39:20 -04:00
Mark DePristo
c30e1db744
Better location for help utils
2011-08-18 09:38:51 -04:00
Mark DePristo
4da42d9f39
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-18 09:32:57 -04:00
Eric Banks
c91a442be1
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-17 22:40:16 -04:00
Eric Banks
b75a1807e3
Adding integration test to cover sample exclusion
2011-08-17 22:40:09 -04:00
Eric Banks
a7b70e6bb4
Adding feature for Khalid: ability to exclude particular samples.
2011-08-17 22:28:22 -04:00
Mauricio Carneiro
cc3df8f11a
Moving GAV walker to public
...
Walker is updated to the new RodBinding system and has the new GATKDocs layout.
2011-08-17 21:55:17 -04:00
Eric Banks
fa1db3913b
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-17 21:49:25 -04:00
Eric Banks
8e83b6646b
Bug fix for Chris: don't validate ref base for complex events.
2011-08-17 21:49:14 -04:00
Matt Hanna
c104dd7a09
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-17 16:59:12 -04:00
Matt Hanna
81a792afeb
Reverting optimization disable in unstable.
2011-08-17 16:58:24 -04:00
Mark DePristo
2e35592295
GATKDocs for CallableLoci
2011-08-17 16:32:01 -04:00
Guillermo del Angel
c193f52e5d
Fixed up examples: pasting from wiki still had old rod syntax
2011-08-17 16:29:45 -04:00
Matt Hanna
2b2a4e0795
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/stable
2011-08-17 16:26:45 -04:00
Matt Hanna
297c9e513c
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable into unstable
2011-08-17 16:24:02 -04:00
Matt Hanna
a210a62ab9
Merged bug fix from Stable into Unstable
2011-08-17 16:23:31 -04:00
Mark DePristo
d59e6ed274
Fix for RefSeqCodec bug and better error messages
...
-- RefSeqCodec bug: getFeatureClass() returned RefSeqCodec.class, not RefSeqFeature.class. Really should change this in Tribble to require Class<T extends Feature> to get compile time type checking
-- Better error messages that actually list the available tribble types, when there's a type error
2011-08-17 16:22:07 -04:00
Matt Hanna
d170187896
Disable optimization that increases marginal speed of the GATK slightly but
...
can produce data loss in a narrow corner case where the BGZF block(s) locations
and offsets in the last index bucket of contig n overlap exactly with the BGZF
block locations and offset in the last index bucket of contig n+1.
A proper fix that keeps the optimization has already been introduced into
unstable, but disabling the optimization is a low risk way to make sure that
users of stable experience no data loss.
2011-08-17 16:16:05 -04:00
David Roazen
53006da9a5
Improved descriptions for the SnpEff annotations in the VCF header
...
(based on Eric's feedback).
2011-08-17 16:09:10 -04:00
Guillermo del Angel
784fb148b9
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-17 15:47:01 -04:00
Guillermo del Angel
671330950d
Updated Beagle walker for gatkdocs format. Pushed unsupported, undocumented arguments to @Hidden
2011-08-17 15:46:31 -04:00
Andrey Sivachenko
0af68e052a
Merge branch 'master' of ssh://cga1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-17 15:17:47 -04:00
Andrey Sivachenko
a423546cdd
fix: RefSeq contains records with zero coding length and the refsec codec/feature used to crash on those; now such records are ignored, with warning printed (once)
2011-08-17 15:17:31 -04:00
Andrey Sivachenko
710d34633e
now the reads that are too long are truly ignored (fix of the fix)
2011-08-17 15:16:23 -04:00
Eric Banks
2f19046f0c
Adding docs to the 2 beasts. Saved the worst for last.
2011-08-17 14:19:14 -04:00
Andrey Sivachenko
069554efe5
somatic indel detector does not die on reads that are too long (likely contain a huge deletion) anymore; instead print a warning and ignore the read
2011-08-17 14:05:19 -04:00
Eric Banks
c405a75f54
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-17 13:28:25 -04:00
Eric Banks
575303ae6b
Renaming for consistency and bringing up to speed with new rod system
2011-08-17 13:28:19 -04:00
Eric Banks
6d629c176c
Adding docs
2011-08-17 13:27:36 -04:00
Eric Banks
a21e193a9e
Adding docs to 3 more walkers
2011-08-17 12:35:08 -04:00
Menachem Fromer
98acb546a9
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-17 12:22:29 -04:00
Menachem Fromer
d1bb302d12
Added GatkDocs documentation
2011-08-17 12:21:37 -04:00
Mark DePristo
3da71a9bb6
Clean up summary
2011-08-17 12:04:45 -04:00
Mark DePristo
c6fb215faf
GATKDocs for VariantsToTable
...
-- Made a previously required argument optional, as this was a long-standing bug
2011-08-17 12:02:41 -04:00
Mark DePristo
5f794d16a7
Fixed bad character in documentation
2011-08-17 12:01:08 -04:00
Mark DePristo
9d1d5bd27a
Revert "Fixed bad character in documentation"
...
This reverts commit a1f50c82d3cb25e5e83d36e9054d74cdee957d87.
2011-08-17 11:57:31 -04:00
Mark DePristo
78deb3f195
Fixed bad character in documentation
2011-08-17 11:57:00 -04:00
Mark DePristo
79dcfca25f
Fixed bad character in documentation
2011-08-17 11:56:51 -04:00
Eric Banks
b3b5d608ca
Adding docs to yet more walkers
2011-08-17 09:57:19 -04:00
Eric Banks
fadcbf68fd
Adding docs to QC walkers
2011-08-17 09:39:33 -04:00
Mauricio Carneiro
5d6a6fab98
Renamed softUnclipped functions to refCoord*
...
These functions return reference coordinates, so they should be named accordingly.
2011-08-16 18:56:28 -04:00
Mauricio Carneiro
ed8f769dce
Fixed index for getSoftUnclippedEnd()
...
Unclipped end can be calculated simply by looking at the last cigar element and adding it's length in case it's a soft clip.
2011-08-16 18:54:28 -04:00
Eric Banks
5f3f46aad1
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-16 16:26:33 -04:00
Eric Banks
946f5c53fe
Adding docs to more walkers
2011-08-16 16:26:26 -04:00
Mark DePristo
6e828260a0
Removed -B support. Now explodes with error if -B provided.
2011-08-16 16:13:47 -04:00
Ryan Poplin
2d5bbecd9e
Merged bug fix from Stable into Unstable
2011-08-16 14:19:04 -04:00
Mauricio Carneiro
07c1e113cd
Fixed interval traversal for previously hard clipped reads.
...
If a read was hard clipped for being low quality and no does not overlap the interval anymore, this read will now be discarded instead of treated as an error by the GATK traversal engine.
2011-08-16 14:18:05 -04:00
Ryan Poplin
9d4add3268
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2011-08-16 14:18:03 -04:00
Ryan Poplin
170d1ff7b6
Fix in UG for trying to call indels at IUPAC code bases when in EMIT_ALL_SITES mode
2011-08-16 14:17:46 -04:00
Mauricio Carneiro
b135565183
Added low quality clipping
...
Clips both tails of a read if the tails are below a given quality threshold (default Q2).
*Added special treatment for reads that get completely clipped.
2011-08-16 13:51:25 -04:00
Andrey Sivachenko
9f3328db53
fixing read group name collision: before writing the read into respective stream in nway-out mode we now retrieve the original rg, not the merged/modified one
2011-08-16 13:45:40 -04:00
Eric Banks
ab0b56ed11
Minor doc fixes
2011-08-16 12:55:45 -04:00
Eric Banks
125ad0bcfa
Added docs to RTC
2011-08-16 12:46:48 -04:00
Eric Banks
ef9216011e
Added docs to IR
2011-08-16 12:24:53 -04:00
Eric Banks
ab1e3d6a98
Use the right set of sample names
2011-08-16 01:03:05 -04:00
Eric Banks
36c7f83208
Refactoring VE stratifications so that they don't pass around bulky data; instead just pull needed data from the VE parent. This allows us stop using deprecated features of the rod system.
2011-08-15 16:31:57 -04:00
Eric Banks
1246b89049
Forgot to initialize variants on the merge
2011-08-15 16:00:43 -04:00
Mauricio Carneiro
993ecb85da
Added Hard Clipping Tail Ends
...
Added functionality to hard clip the low quality tail ends of reads (lowQual <= 2)
2011-08-15 15:22:54 -04:00
Eric Banks
045e8a045e
Updating random walkers to new rod system; removing unused GenotypeAndValidateWalker
2011-08-15 14:05:23 -04:00
Eric Banks
fc2c21433b
Updating random walkers to new rod system
2011-08-15 13:29:31 -04:00
Eric Banks
3d56bbf087
Resolving merge conflicts
2011-08-15 12:28:05 -04:00
Eric Banks
9ddbfdcb9f
Check filtered status before applying to alt reference
2011-08-15 12:25:23 -04:00
Mauricio Carneiro
0d976d6211
Fixed second time clipping
...
When a read is clipped once, and then in the second operation, because of indels, it doesn't reach the coordinate initially set for hard clipping, the indices were wrong. This should fix it.
2011-08-15 12:04:53 -04:00
Mauricio Carneiro
489c15b99d
Fixed indexing issue in coordinate conversion
...
When a read had been previously soft clipped, the UnclippedEnd could not be used directly as Reference Coordinate for clipping , because the read does not go that far.
2011-08-15 01:42:34 -04:00
Mauricio Carneiro
c7b69a4574
Fixed integration tests
2011-08-14 16:38:20 -04:00
Mauricio Carneiro
6ae3f9e322
Wrapped clipping op information
...
The clipping op extra information being kept by this walker was specific to the walker, not to the read clipper. Created a wrapper ReadClipperWithData class that keeps the extra information and leaves the ReadClipper slim.
(this is a quick commit to unbreak the build, performing integration tests and will make further commits if necessary)
2011-08-14 15:44:48 -04:00
Mauricio Carneiro
8a51732049
Fixes to ReadClipper and added Reference Coordinate clipping.
...
* Added reference coordinate based hard clipping functions. This allows you to set a hard cut on where you need the read to be trimmed despite indels.
* soft clipping was messing up cigar string if there was already a hard clip at the beginning of the read. Fixed.
* hard clipping now works with previously hard clipped reads.
2011-08-14 14:54:33 -04:00
Mauricio Carneiro
291d8c7596
Fixed HardClipping and Interval containment
...
* Hard clipping was wrongfully hard clipping unmapped reads while soft clipping then hard clipping mapped reads. Now we throw exception if we try to hard/soft clip unmapped reads and use the soft->hard clip procedure fore every mapped read.
* Interval containment needed a <= and >= to make sure it caught the borders right.
2011-08-14 14:54:33 -04:00
Mauricio Carneiro
0be1dacddb
Refactored interval clipping utility
...
reads are clipped in map() and now we cover almost all cases. Left behind the case where the read stretches through two intervals. This will need special treatment later.
2011-08-14 14:54:33 -04:00
David Roazen
9d2cda3d41
Removed a public -> private dependency in our test suite.
2011-08-12 17:29:10 -04:00
David Roazen
bb4ced3201
SnpEff-related fixes.
...
-To correctly handle indels and MNPs, only consider features that start at the current locus,
rather than features that span the current locus, when selecting the most significant effect.
-Throw a UserException when a SnpEff rodbinding is not provided instead of simply not adding
any annotations and silently returning.
2011-08-12 15:26:24 -04:00
Mauricio Carneiro
10e873d9c6
Merge branch 'repval'
2011-08-12 15:24:31 -04:00
Guillermo del Angel
31dc831531
Merged bug fix from Stable into Unstable
2011-08-12 13:26:41 -04:00
Menachem Fromer
9121b8ed65
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-12 12:24:19 -04:00
Menachem Fromer
7ed120361d
Fixed bug that required symbolic alleles to be padded with reference base and added integration test to test parsing and output of symbolic alleles
2011-08-12 12:23:44 -04:00
Eric Banks
7ea9196321
Better error message for name/type clashes.
2011-08-12 11:18:14 -04:00
Eric Banks
27f0748b33
Renaming the HapMap codec and feature to RawHapMap so that we don't get esoteric errors when trying to bind a rod with the name 'hapmap' (since it was also a feature).
2011-08-12 11:11:56 -04:00
Eric Banks
005bd71be3
Working too quickly earlier. Fixing syntax.
2011-08-12 10:29:36 -04:00
Menachem Fromer
c7ca33cbff
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-12 10:12:09 -04:00
Eric Banks
639a01f382
Updating integration test now that VE has been updated
2011-08-12 07:15:08 -04:00
Eric Banks
41f3da75d7
Implementation in VE was confusing 'variant' status vs. 'polymorphic' status. This led to issues because we now match types of eval and comp; specifically, subsetting a VC to a monomorphic sample can't change the 'variant' status of the VC (it's still a variant site or otherwise we'll never match the comps, which breaks GenotypeConcordance). CountVariants really got this wrong. Fixed. VE now passes all integration tests.
2011-08-12 02:22:44 -04:00
Eric Banks
45f973ab1f
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-12 00:40:18 -04:00
Eric Banks
eba316621d
Finish moving VE over to new rod system and fixing up the type inconsistency between eval and comp rods. Now the novel count is always 0 under the known stratification. :)
2011-08-12 00:40:08 -04:00
Menachem Fromer
9de06560df
Update to new RodBinding system
2011-08-11 17:54:16 -04:00
Ryan Poplin
f1d1252be2
Fixing syntax of BQSR and UG performance tests.
2011-08-11 17:04:09 -04:00
Ryan Poplin
902eb0c61e
Adding dbsnp annotation back into the UG integration tests
2011-08-11 13:55:03 -04:00
Eric Banks
90771b74b4
When matching eval to comps, try to choose the one with the same alt allele.
2011-08-11 13:55:01 -04:00
Eric Banks
200f73b008
No reason to warn the user anymore because it's no longer possible for them to specify a dbsnp file on the command-line.
2011-08-11 13:44:07 -04:00
Eric Banks
e93538cdf7
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-11 13:39:36 -04:00
Eric Banks
265c3d744b
Fixing VariantEval logic and having it use the new rod system.
2011-08-11 13:39:34 -04:00
Ryan Poplin
b705d9cf15
Oops, these VariantAnnotator input bindings aren't needed during the UG
2011-08-11 13:17:16 -04:00
Ryan Poplin
7fade88070
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-11 11:02:47 -04:00
Ryan Poplin
c7b9a9ef0a
Updating UnifiedGenotyper to use the new rod binding system.
2011-08-11 11:02:11 -04:00
Mark DePristo
418a4d541f
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-11 11:01:38 -04:00
Mark DePristo
e71255d3c2
GATKDocsExample walker
...
-- Shows the best practice for documentating a walker with the GATKdocs
-- See http://www.broadinstitute.org/gsa/wiki/index.php/GATKdocs#Writing_GATKdocs_for_your_walkers for a brief discussion
2011-08-11 11:01:21 -04:00
Ryan Poplin
79c86e211f
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-11 09:59:20 -04:00
Ryan Poplin
ea42ee4a95
Updating BQSR for the new rod binding system.
2011-08-11 09:58:42 -04:00
Mark DePristo
8cdc0cbd9c
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-11 08:58:49 -04:00
Mark DePristo
40e06f9afb
Fixed broken RodBinding defaults.
...
-- Verified now to be correct at runtime
-- UnitTest covers this
-- createTypeDefault now takes a Type, not a Class, so that parameterized classes can have their parameter fetched in the defaults.
2011-08-11 08:58:30 -04:00
Ryan Poplin
dd5fe8291d
Fixing up some comments in the BQSR
2011-08-11 08:36:00 -04:00
Eric Banks
f1b09db39e
Fixes for rod bindings
2011-08-10 23:08:47 -04:00
Eric Banks
75985c2fa0
Resolving merge conflicts
2011-08-10 22:45:11 -04:00
Eric Banks
bdb1da30fd
Better interface for getting RodBindings to the VariantAnnotatorEngine and its annotations: pass around an AnnotatorCompatibleWalker (interface) object. Updating VA to use the new rod system.
2011-08-10 22:43:08 -04:00
Mark DePristo
0086e27741
makeUnbound now package protected
...
-- Removed references to it in the codebase
-- Fixed documentation I saw that had the summary + body style
2011-08-10 22:29:32 -04:00
Mark DePristo
cb6cf25bb0
Updating SelectVariants documentation to reflect best practice
2011-08-10 22:24:18 -04:00
Mark DePristo
00b4d6ec57
Updated the best practice on documenting a field
...
-- Best practice is now to skip the summary, as this is the @annotation doc value.
2011-08-10 22:21:12 -04:00
Mark DePristo
2007d2fcad
Better documentation for default value fields
...
-- DocString function for types that create default outputs "stdout"
-- RodBinding now creates a makeUnbound default value automatically for you if your RodBinding isn't required
-- Removed warning about sparse help from TextFormattingUtils
2011-08-10 22:16:22 -04:00
Mauricio Carneiro
bb557266ca
Merge branches to get new RodBinding framework
...
Conflicts:
private/java/src/org/broadinstitute/sting/gatk/walkers/replication_validation/ReplicationValidationWalker.java
2011-08-10 18:23:01 -04:00
Guillermo del Angel
8325cb8c26
Fixing up apparent source control/merge snafu: fix to correctly output PL ordering in multi-allelic sites by UG was only half-committed and hence not working. This completes fix
2011-08-10 15:31:49 -04:00
Eric Banks
07ad8c78a9
More tools moved over. Fixed the VariantContextIntegrationTest which was not useful because the md5s were all removed. In the future, instead of removing md5s (putting it in 'parameterization' mode), you should instead use @Test{enabled=false} since it's easier to track.
2011-08-10 14:24:40 -04:00
Eric Banks
8d14d32a62
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-08-10 13:42:37 -04:00
Eric Banks
749c8bfbcd
Moving more tools over to the new rod system
2011-08-10 13:42:35 -04:00
David Roazen
0497170bc9
SnpEffCodec now implements SelfScopingFeatureCodec so that we no longer have to specify the codec name on the command line for SnpEff files.
2011-08-10 13:12:09 -04:00