depristo
84b6d2926b
Useful walker that creates a new interval list with only the interval overlapping input sites list. Really a one-off walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4559 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 19:55:04 +00:00
depristo
78b4a1c240
VariantsToTable now supports the virtual TRANSITION field
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4558 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 19:53:46 +00:00
hanna
e6d61197e6
Disable OTF indexing when writing indices for temporary VCFs when running
...
with -nt option. When last I checked in, Ryan was seeing a ~25% speedup
per shard by not indexing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4556 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 17:40:37 +00:00
depristo
e6b008f87c
Fixed >= vs. > test leading to failure to tolerate dynamic indexes that are created at *exactly* the instant the output VCF is closed too
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4555 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 16:11:14 +00:00
ebanks
72c5b75460
Tribble exceptions can be generated outside of the normal codec parsing code because we now lazy load the VCF genotype fields. I'm not sure how else to account for this (to make sure they show up as user errors and not GATK system errors) besides catching them here.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4554 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 15:22:17 +00:00
delangel
e24f7fec47
Fixed indel genotyper which broke yet again because we can't just call context.getBasePileup() without checking again for its existence in the first place.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4553 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 15:17:11 +00:00
ebanks
c0b4317311
Er, here's the right fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4552 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 15:08:25 +00:00
ebanks
181f901126
Fix for Ryan: don't pull reference sequence for the portions of reads that extend beyond the contig boundaries
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4551 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 14:38:26 +00:00
ebanks
9f76aed515
Fix for IDs 5zP7jJeffK2sdPH1BH4JBVSrQztVEDKP and nX0cuBjoqBW4NQFpM6dE13KpkCuYFpZu
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4550 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 14:05:27 +00:00
hanna
d4feb99d9a
For parallel ROD traversals, simplified reference sharding. Will replace
...
with a more sensible strategy for sharding w/o BAMs at some point after
ASHG.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4549 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-22 05:08:15 +00:00
fromer
9ba7269728
Fixed Integration Tests to output VCF files with -NO_HEADER
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4548 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 19:49:44 +00:00
fromer
60f88866dd
Uses VCFConstants instead of hard-coded constants
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4547 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 19:49:01 +00:00
fromer
883b8ff80e
Removed flush() method from VCFWriter interface; added takeOwnershipOfInner parameter in constructor of wrapper VCFWriters to designate if the Writer should close the inner Writer it receives on construction
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4546 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 19:48:00 +00:00
fromer
1ea43be976
Removed flush() method from VCFWriter interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4545 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 19:46:42 +00:00
chartl
3566ad2146
Wrong if statement.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4544 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 17:37:45 +00:00
chartl
bf17f92b64
Do not look for samples in dbsnp binding
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4543 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 17:36:38 +00:00
ebanks
225cf49128
Implementing reference confidence estimate in UGv2 as per UGv1
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4542 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 16:57:59 +00:00
delangel
cf9c9ae241
Three important updates for Dindel genotyper:
...
a) Fix it up because it broke with a recent checkin to annotate vcf with unfiltered depth.
b) Printout of ref/alt alleles in output vcf was incorrect because the start/stop positions of associated GenomeLoc were incorrectly computed in case of a deletion.
c) Redid Beagle input/output walkers as not assume that ref was a single base, not to assume that variant was a vcf and generalized it to be indel-capable, so now the Beagle walkers can be used for indels as well.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4541 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 16:00:16 +00:00
kshakir
b88cfd2939
Updated MD5s of VCFs, since the approximate command line arguments injected into the VCF headers now have a little more order to them thanks to changes in the ParsingEngine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4538 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 03:07:40 +00:00
ebanks
8f38ebf98e
Throw a user exception when using the clustered SNP filter in the presence of ref calls. It's unfortunate, but until we get a windowed ROD context this is just too much of a headache to support.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4537 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-21 02:44:10 +00:00
kshakir
88a0d77433
Changed parsing engine to store the order the argument bindings based on their definition in the class, moving "-T" to the front of Queue command lines.
...
Queue GATK generated .intervals is now a List(File) again removing special case handling in the generator.
Instead of using @Scatter annotation, using ScatterFunction instance to determine if a job can be scattered.
Implemented special VcfGatherFunction which only uses the header from the first file, even if the other files differ in their headers.
Added a -deleteIntermediates to Queue to delete the outputs from intermediate commands after a successful run.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4536 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 21:43:52 +00:00
ebanks
91049269c2
Optimizations across the board, with help from Guillermo, Matt, and JProfiler. Too tired to give details now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4535 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 20:47:41 +00:00
fromer
f76865abbc
ReadBackedPhasing now uses a SortedVCFWriter to simplify, and has the ability to merge phased SNPs into MNPs on the fly [turned off by default]; MergeSegregatingPolymorphismsWalker can also do this as a post-processing step; Integration tests for MergeSegregatingPolymorphismsWalker were also added
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4534 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 20:27:10 +00:00
fromer
e8079399ac
Added flush() method to VCFWriters
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4533 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 20:23:22 +00:00
fromer
00726b6c4b
Added mergeIntoMNPs to merge successive VCF records into a single MNP VCF [if possible]
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4532 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 19:40:26 +00:00
fromer
55230ce5f3
Added startsBefore, startsAfter, and minDistance [calculates distance between any pair of bases in the two GenomeLocs]
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4531 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 19:12:34 +00:00
ebanks
4f77581087
More optimizations for HaplotypeScore: pulling final constants out of loops
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4530 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 17:40:57 +00:00
hanna
20fac43521
Add extra logging to the GATK run report at the start of metrics aggregation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4529 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 17:32:51 +00:00
ebanks
a205900eff
Naughty use of Strings in HaplotypeScore literally double the runtime of Unified Genotyper. Moved over to bytes and no longer allow Strings in the Haplotype util class. New round of profiling on tap for tomorrow.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4528 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 03:32:21 +00:00
depristo
f9541b78d3
Timing of traversal now starts at the start of the traversal, so the rate is reasonable right off the bat. For example, we now see: INFO 22:45:02,476 TraversalEngine - [TRAVERSAL STARTING]; INFO 22:45:32,484 TraversalEngine - [PROGRESS] Traversed to 2:50850686, processing 18,646 sites in 30.05 secs (1611.50 secs per 1M sites)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4527 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 02:47:34 +00:00
depristo
f7ce18553e
GenotypeConcordance now prints interesting sites more nicely. RMDTrackBuilder is now uses the root class FeatureSource not BasicFeatureSource.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4525 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-20 00:29:02 +00:00
ebanks
7a291a8ff3
First pass at a VCF validator. Will test more tonight.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4524 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-19 19:55:49 +00:00
chartl
341e93ee12
The reference fixer seems to have munged the OMNI rather than making it better. Looks like some sites need to only have the ref and alt bases swapped, and others need to have the genotypes swapped as well? E.g.
...
some subset need
A C 1/1 --> C A 0/0
while another subset need
A C 1/1 --> C A 1/1
it's unclear how big these subsets are (or even if one is empty). What I do know is, doing the first one totally screws up concordance metrics for the 421-sample chip. So either something else needs to be done, or there's a bug in this walker. Until I know for sure, I've added an initialize exception to disable this thing...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4523 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-19 12:50:24 +00:00
ebanks
5251f49a90
Including Marian Thieme's BaseCounts class (with some modifications)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4522 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-19 03:07:30 +00:00
hanna
c5f105d050
Fix boneheaded mistake in the new interval filtering code I added on Sunday.
...
Sorry everyone.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4521 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-19 01:20:12 +00:00
ebanks
524cb8257c
Renaming for accuracy
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4519 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-18 18:11:07 +00:00
ebanks
0fe504b748
Use filtered depth for Exact model (just like grid search)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4518 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-18 18:08:31 +00:00
ebanks
d54d9880d7
Now that G's new genotyping algorithm is live, I've cleaned up the code to completely separate the grid search from the exact model. AlleleFrequencyCalculationModel is now completely abstract.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4517 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-18 18:04:06 +00:00
ebanks
80e5ac65b4
CAP_BASE_QUALITY needs to be included in the clone() method for it to be usable in UG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4516 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-18 03:11:03 +00:00
hanna
6af9532090
Fix for GATK slowdowns at the ends of intervals.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4514 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-17 23:21:23 +00:00
chartl
5889138f4a
*facepalm*
...
forgot to add the samples to the header. How could the VCFWriter let me get away with something so boneheaded?!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4513 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-17 05:36:29 +00:00
chartl
2bc5971ca1
Added - a tool to fix reference bases of a VCF. The OMNI had a couple of sites with incorrect reference bases (look to be legacy from other chips), and a few more that had ref and alt flipped. GAP should probably take care of it, but since I need results by monday, I'm doing it.
...
Modified - SelectVariants: Hook up to VariantContextUtils to recalculate AC/AF/AN, which uses the accessor in VariantContext to do this. Somehow sites that were selected down to hom-ref genotypes only wound up getting positive AC.
**IMPORTANT** I kind of need input here. The header of a file used for an integration test specifies AC as being an integer. Recalculating it casts it into an integer list (which it should be, as it allows for alternate alleles). However this appears to clash with what the jexl expression is looking for? For now, the integration test itself needed to be changed -- it's unclear what to do when the header specifies AC of being one class, but recalculating it casts to another class, and I'm not sure what to do.
I'm committing my omni_qc pipeline because I'm almost certain 2 months down the road I'm going to wonder what the heck I did to generate my results.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4511 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-17 03:18:01 +00:00
ebanks
7aa030a9a4
Hmm. Apparently variants can get lifted over to different chromosomes. Who knew? Reverting changes from a couple of days ago. The only way to do this correctly (without requiring lots of memory) is to turn off on-the-fly indexing for this walker. Integration tests cover this now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4510 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-17 02:54:12 +00:00
chartl
8b2d387643
Added in an eval module that calculates the dispersion histograms between eval and comp (e.g. M_{i,j} = # of times eval observed to have AC i, comp AC j -- for af it's i/100 vs j/100 )
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4507 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 19:07:43 +00:00
ebanks
f78ff08e2b
This is less correct than my previous change but it's what UGv1 does and now is not the right time to start mucking with things.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4506 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 18:56:45 +00:00
ebanks
471c18054f
Fix for SB calculation: the best overall AF might not have any mass when just looking at reads from a single strand. We need to compute the best AF for each stratification.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4505 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 17:51:18 +00:00
asivache
42c3d74432
bug fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4503 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 16:27:40 +00:00
chartl
c9d473edee
More changes to Variant Eval and Genotype Concordance (passes all integration tests):
...
1: -sample can now include a file, which will be parsed for sample-name entries
2: If you request a sample to run analysis on, but it is not present in any of your RODs, VEW will exception out
3: Change added to parse Integer, String, and List<Integer> type Allele Count annotations (error otherwise)
4 [slightly problematic]: The count objects now maintain row-keys in order, as the keys were taking an inordinate amount of time in onTraversalDone (multiple calls to getRowKeys(), so many multiple sorts of the same underlying unsorted object, very bad)
There is a legacy comparison object which is unused which I will strip out soon.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4502 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 12:40:36 +00:00
ebanks
954dd84f51
Adding an integration test (against hg18 this time) that requires on-the-fly sorting in order to work properly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4500 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 07:45:21 +00:00
ebanks
9f54170dff
Hooking up the liftover tool to the new on-the-fly sorting VCF writer so that records can now get emitted in order.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4499 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-15 07:27:01 +00:00