ebanks
e2d45ec2af
Make Indel Realigner exceptions related to not enough space on disk or a too low file-handle limit UserExceptions.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4801 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-07 16:37:31 +00:00
depristo
70980b659a
CombineVariants no longer requires rod_priority_string
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4800 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-07 15:39:43 +00:00
depristo
bc885b7bd0
Don't print debugging output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4799 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-06 20:57:11 +00:00
depristo
c91712bd59
BAQ calculation refactoring in the GATK. Single -baq argument can be NONE, CALCULATE_AS_NECESSARY, and RECALCULATE. Walkers can control bia the @BAQMode annotation how the BAQ calculation is applied. Can either be as a tag, by overwriting the qualities scores, or by only returning the baq-capped qualities scores. Additionally, walkers can be set up to have the BAQ applied to the incoming reads (ON_INPUT, the default), to output reads (ON_OUTPUT), or HANDLED_BY_WALKER, which means that calling into the BAQ system is the responsibility of the individual walker.
...
SAMFileWriterStub now supports BAQ writing as an internal feature. Several walkers have the @BAQMode applied to this, with parameters that I think are reasonable. Please look if you own these walkers, though
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4798 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-06 20:55:52 +00:00
depristo
5d2c2bd280
Just refactoring into utils/baq directory
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4795 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-06 17:43:43 +00:00
depristo
80f32712dc
Tiny bug fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4793 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-05 18:48:33 +00:00
depristo
44feb4a362
Improved BAQ implementation. Now supports adding BAQ tags to reads on the fly with ADD_TAG_ONLY option. Caching fasta reader implementation, and changes throughout the system to enable this. Many performance improvements throughout the system due to better reference access patterns.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4792 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-05 18:29:39 +00:00
ebanks
8901e63879
Cheap optimization: don't keep calculating the log of a constant. (How did I not catch this before?)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4791 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-05 04:36:21 +00:00
ebanks
bef48e7a42
For Chris, to make his life easier: iterate over all VCF records passed in looking for one with an ALT allele defined instead of assuming all records have one.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4789 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-05 02:23:38 +00:00
depristo
97c94176c0
Immediate, obvious bug fix to avoid blowing up on unmapped reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4788 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-04 20:43:39 +00:00
depristo
a5b3aac864
Engine-level BAQ calculation now available in the GATK [totally experimental right now]. -baq argument to disable (NONE), to only use the tags in the BAM (USE_TAG_ONLY), use the tag when present but calculate on the fly as necessary (CALCULATE_AS_NECESSARY), and to always recalculate (RECALCULATE_ALWAYS). BAQ.java contains the complete implementation, for those interested. ValidateBAQWalker is a useful QC tool for verifying the BAQ is correct. BAQSamIterator applies BAQ to reads, as needed, in the engine. Let me know if you encounter any problems. Before prime-time, needs a caching implementation of IndexedFastaReader to avoid loading *lots* of reference data all of the time
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4787 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-04 20:23:06 +00:00
fromer
b12cec4302
Added emitOnlyMNPs flag
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4785 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-03 20:34:17 +00:00
fromer
6d4ec7f9e7
Remove RefSeq INFO from MNPs since annotating them properly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4784 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-03 19:03:35 +00:00
fromer
4719bbc772
Changed dontRequireSomeSampleHasDoubleAltAllele parameter to mean that merging should only start at a polymorphic site
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4783 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-03 17:52:56 +00:00
ebanks
ec174dc0ba
As per Menachem's last commit, there's a minimally more efficient way of doing the MQ cap.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4782 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-03 16:37:08 +00:00
fromer
92cf7744a6
Set minMQ = max(minMQ, minBQ) for phasing since anyway we cap BQ by MQ; also, lowered MIN_BASE_QUALITY_SCORE for phasing to 17 (was previously 20)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4781 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-03 16:31:13 +00:00
ebanks
237ab1d489
1. As discussed in group meeting today, because we cap BQ by MQ, if MQ < minBQ then we filter the read.
...
2. Update to UGCalcLikelihoods for Chris: require a vcf bound to 'allele' to be provided so that we know exactly which alternate allele we should be calculating GLs for at each site. The user is warned when the VC is not biallelic or there are multiple records at a site.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4780 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-03 05:57:06 +00:00
delangel
da6a07ad3b
First round of critical fixes to indel genotyper (more to come tomorrow):
...
a) Avoid complete crash of caller that broke due to a recent refactoring by someone who must not be named <cough>EB<cough>... an integration test to avoid this in the future coming soon.
b) Fixed up strand bias computation for indels
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4779 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-03 02:48:09 +00:00
fromer
e09d6ee56b
write non-MNP VariantContexts records only once (where they start)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4777 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-02 22:14:26 +00:00
fromer
1515bf6de9
Merged common VCF writing logic into phasing/WriteVCF.java
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4776 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-02 22:03:02 +00:00
asivache
4e62de4213
Added method getOriginalReadGroupId(): takes merged (in case of collision) read group id as reported by a read coming from the merged stream and returns this read's read group id as it was listed in the original input bam file.
...
IndelRealigner now uses this functionality to correctly un-mangle read group id's in --nWayOut mode (i.e. when we need to write reads into separate output bams with headers matching the original inputs).
Some hidden changes to IndelRealigner: purely testing and development, transparent to the users (hidden option added)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4775 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-02 21:41:52 +00:00
rpoplin
e5282742f9
Bug fix in CountCovariates, skip over indel records as well as SNPs in the dbsnp file. CountCovariates is now called CountCovariatesWalker. I've always hated that the name was swapped.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4774 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-02 18:43:24 +00:00
rpoplin
0adf505b53
We no longer look at by-hapmap validation status in the VQSR because using the HapMap VCF file is higher quality. As a side effect we now support the dbsnp 132 vcf file. ApplyVariantCuts now requires that the input VCF rod bindings begin with input, matching the other VQSR walkers. Wiki updated with information about how to obtain the hapmap and 1kg truth sets.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4772 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-02 15:38:45 +00:00
ebanks
99b942b0b4
Removing duplicated header args
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4770 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-01 20:16:53 +00:00
fromer
9ac0f98d0d
Fixed bug in retaining proper RefSeq records
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4768 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-01 18:39:02 +00:00
ebanks
7caf666f48
For Sendu: add a hidden option to allow bams to come out unsorted. We've agreed to let him deal with sorting these puppies on his own.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4767 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-01 17:56:13 +00:00
ebanks
3afa841a6a
Fixing docs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4766 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-01 17:36:47 +00:00
ebanks
6a6cdc1925
Adding minor usage docs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4765 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-01 17:34:33 +00:00
ebanks
0d1c905df3
Adding UGCalcLikelihoods and UGCallVariants so that GSA members can break up the calling process into separate steps (calculate the GLs and then call off of those) - useful for Chris's new batch merger. As the docs say, these are absolutely not supported or recommended for public use.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4764 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-01 17:32:26 +00:00
fromer
b4ef716aaf
As per Eric and Mark's suggestions, separated the segregating MNP merger (MergeMNPs) from the more general merger employed for annotation purposes (MergeSegregatingAlternateAlleles). Both use the same core MergePhasedSegregatingAlternateAllelesVCFWriter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4763 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-01 16:42:08 +00:00
ebanks
0892daddb0
Improvement for the TGEN folks: when running in the solid recal mode of SET_Q_ZERO_BASE_N, update the NM tag if one was present in the read to reflect the new N's in the read.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4761 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-01 04:36:44 +00:00
asivache
a22b1b04e6
SW-turbo. Kind of. This implementation is presumably equivalent to the old one (mathematically), but runs ~10 times faster: inner loops eliminated completely. The author of the original implementation should be sentenced to the galleys. Oh, that would be me...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4760 348d0f76-0448-11de-a6fe-93d51630548a
2010-12-01 00:08:47 +00:00
delangel
2ac938fe4e
1)
...
Minor fixes to avoid crashes vs CG indel files:
- Add count for complex events, not just insertions and deletions
- Handle correctly cases of large indels falling out of bounds of histogram array: added a count of indels ouf of bounds and avoid exceptions.
2) Cosmetic fix for R script assessing UG calling performance: draw red y=x line on top of Simulated vs Estimated AC to get a better view of under/over-estimation of AC.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4758 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-30 21:08:25 +00:00
rpoplin
af84462f3e
The dev team has decided to change the filter that is added to records that are set to monomorphic by Beagle. It no longer lists the reference allele. Added those filters to the header of the output VCF file. Finally, we no longer use R2=NaN values coming from Beagle.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4757 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-30 17:19:54 +00:00
ebanks
21256909bb
Not supported. I'm checking this in for Ryan only.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4756 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-30 16:59:18 +00:00
kshakir
e21a66d876
Updated the Queue GATK generator and packaging to include more dependencies for fullCallingPipeline.q.
...
Set the -bigMemQueue in the FullCallingPipelineTest to GSA to avoid waiting for the week queue when it is busy.
Fixed the package definition of PipelineTest so that scalac won't recompile it every time.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4755 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-30 15:29:40 +00:00
aaron
7f2ded0706
belated special case fix for Menachem; if the results of a BTI and BTIMR produce an empty interval list, exception out. This would be solved long term with better handling or empty and / or null interval lists. I'll add a JIRA
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4754 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-30 05:49:20 +00:00
ebanks
a181680814
We no longer require dbSNP files to be of the dbsnp rod-type; VCFs will do (provided they are bound to the name 'dbsnp')
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4753 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-30 03:25:18 +00:00
asivache
8ffea42b75
about 10% improvement in SW alignment (and hence IndelRealigner!) speed by using c-style linearized array representation for matrices instead of java 2D arrays...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4751 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-30 00:06:50 +00:00
aaron
b03ac61e9d
consolidating the checking of the RMD sequence dictionary against the reference into a single function, and adding an integration test to test that empty VCFs pass (both the indexing and the seq dictionary validation).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4750 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-30 00:01:56 +00:00
hanna
abc13d0a90
Temporary hack: force abort with an intelligent message suggesting that users
...
specify -B:dbsnp,vcf <filename> if the filename passed if the --DBSNP argument
value contains 'vcf'. We'll replace this functionality once dbSNP 132 starts
playing nicely with the tagging system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4749 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-29 23:37:30 +00:00
ebanks
d89e17ec8c
Fare thee well, UGv1. Here come the days UGv2.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4747 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-29 21:51:19 +00:00
fromer
727dac7b7a
Added MNP annotation of the number of AA changes occuring in the SAME RefSeq entry (numAAchanges), and if this number is > 1 for any of the alt alleles (alleleHasMultAAchanges)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4746 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-29 21:24:30 +00:00
ebanks
222cd42ceb
Have the UG engine take care of the GL to PL conversion. Note that we still use GLs for calling (since we are losing precision in high-pass and, even worse, it can affect QD), but we emit PLs in all cases. This means that calculating the GLs, emitting them to VCF, and then calling off of them (a la samtools) is absolutely, positively not ideal.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4745 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-29 20:28:16 +00:00
ebanks
102c8b1f59
Large refactoring of the UGv2 engine so that it is now truly separated into 2 distict phases: GL calculation and AF calculation, where each can be done independently. This is not yet enabled in UGv2 itself though because I need to work out one last issue or two. Tested on 1Mb of 1000G Aug allPops low-pass and results are identical as before. Also, making BQ capping by MQ mandatory.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4744 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-28 21:36:33 +00:00
ebanks
ce051e4e9a
Write to sdout when no -o is provided
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4743 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-28 06:19:26 +00:00
ebanks
e3e6d176df
Looking over the daily error log email made me realize that there were 2 implementations of vc.modifyLocation() - the correct one in VC that didn't require lazy loading the genotype data and the bad one in VCUtils that did. Removing the implementation in VCUtils and updating the code accordingly. Also, removing createPotentiallyInvalidGenomeLoc() since no one uses it anymore.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4736 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-26 18:40:34 +00:00
ebanks
35b90d2295
Don't compute SB for ref calls
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4735 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-26 03:54:26 +00:00
ebanks
6934f83cc7
Two changes to CombineVariants.
...
1. Fix: VCs were padded before the merge, but they were never unpadded afterwards. This leaves us with a VC that doesn't meet our spec.
2. Update: instead of running the merged VC through every standard annotation (which seems really wrong, since this isn't the annotator tool), just update the chromosome count annotations (AC,AF,AN) through VCUtils.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4734 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-25 04:52:12 +00:00
fromer
d775192631
Check if MNP annotation of amino acid is dependent on the MNP, or could it be obtained through some single-base variant?
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4733 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 22:38:33 +00:00
rpoplin
0dd40c3684
Updating doc text
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4732 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 21:34:14 +00:00
rpoplin
ed08899abc
Overwhelming evidence that maxQ = 50 is now a better default than maxQ = 40 in the base quality score recalibrator, especially when combined with dbsnp build 132. Also, added option in ProduceBeagleInputWalker for Beagle-ing chromosome X calls with male samples which sets the genotype likelihood for the AB allele to zero for those samples.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4731 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 21:32:26 +00:00
fromer
ca70ed611c
Totally revamped the MNP annotation and put it in its own walker: AnnotateMNPsWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4730 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 18:05:10 +00:00
depristo
8768e1a240
Useful profiling tool that reads in a single rod and evalutes the time it takes to read the file by byte, by line, into pieces, just the sites of the vcf, and finally the full vcf. Emits a useful table for plotting with the associated R script that can be run like Rscript R/analyzeRodProfile.R table.txt table.pdf titleString
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4728 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 14:59:16 +00:00
ebanks
7a8b85dd15
Catch the JEXL exception when trying to match a variable that's not in the context - and don't filter in these cases. Now everyone can happily go back to using the stupid (and hopefully temporary) AlleleBalance filter.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4727 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 05:00:41 +00:00
ebanks
caf2c21f61
Must close the writer to flush the cache
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4726 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 04:33:08 +00:00
ebanks
816c33c821
indel-related fixes to the strict validator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4725 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 04:08:34 +00:00
delangel
9cdc341be5
Trivial update for data processing paper: change syntax of output argument for Beagle by depth walker to update to new GATK format.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4724 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-24 01:45:44 +00:00
ebanks
ea6e2218c1
1. dbsnp has some massive indels which my left-aligner was barfing on because there isn't enough reference context; fixed. 2. Lower default calling threshold to Q30 for UGv2.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4722 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-23 19:28:33 +00:00
aaron
53672361cc
capture more details when something IO-related goes wrong in writing a Tribble index
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4720 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-23 17:06:28 +00:00
hanna
082073ca3c
Stop RBP.getPileupBySample() from throwing a NullPointerException if the
...
sample doesn't exist -- now returns null.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4719 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-23 05:17:06 +00:00
kshakir
787e5d85e9
Added the ability to test pipelines in dry or live mode via 'ant pipelinetest' and 'ant pipelinetest -Dpipeline.run=run'.
...
Added an initial test for genotyping chr20 on ten 1000G bams.
Since tribble needs logging support too, for now setting the logging level and appending the console logger to the root logger, not just to "org.broadinstitute.sting".
Updated IntervalUtilsUnitTest to output to a temp directory and not the SVN controlled testdata directory.
Added refseq tables and dbsnps to validation data in BaseTest.
Now waiting up to two minutes for gather parts to propagate over NFS before attempting to merge the files.
Setting scatter/gather directories relative to the -run directory instead of the current directory that queue is running.
Fixed a bug where escaping test expressions didn't handle delimiters at the beginning or end of the String.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4717 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-22 22:59:42 +00:00
hanna
8ca5edf89f
Fix issue where non-required file inputs can throw a NullPointerException
...
rather than a UserException when an the input argument is specified without
an argument value.
The magnitude of code required to fix this points to a need to give the
command-line argument system a good spring cleaning.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4714 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-22 01:49:17 +00:00
ebanks
b9a59ea54f
Adding Het/Hom ratio to the temp per sample metrics. Because I'm in a generous mood tonight, I'm going ahead and fixing the paths for the classes I'm touching...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4713 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-21 04:24:42 +00:00
ebanks
cff7c6ddce
These are user exceptions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4712 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-21 02:08:11 +00:00
bthomas
374c0deba2
Updating the core LocusWalker tools to include the Sample infrastructure that I added last month. This commit touches a lot of files, but only significantly changes a few: LocusIteratorByState and ReadBackedPileup and associated classes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4711 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-19 19:59:05 +00:00
kshakir
c723db1f4b
Added a -summary jexl argument to VariantEval similar to -validate.
...
Updated the package of ValidationGenotyper to match the file location.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4710 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-19 04:42:46 +00:00
kshakir
79725f2d9c
Excluding the QFunction log files from the set of files to delete on completion.
...
When a QGraph is empty displaying a warning instead of crashing with an JGraph internal assertion error.
Cleaned up code using the Log4J root logger and explicitly talking to a logger for Sting.
When integration tests are run detecting that the logger has already been setup so that messages aren't logged twice.
Updated from Ivy 2.2.0-rc1 to 2.2.0.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4707 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-18 20:22:01 +00:00
depristo
721e8cb679
VariantsToTable now supports wildcard captures. -F PREFIX* now captures all fields that begin with PREFIX, output as a comma-separated list of unique values. Added integration test for VariantsToTable since I find it so useful.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4706 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-18 18:54:59 +00:00
depristo
8cba86a69d
Trivial code organization for the haplotype score
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4703 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-18 12:32:55 +00:00
hanna
9f356b6cd0
Package all walkers in org/broadinstitute/sting/gatk/walkers directory in release.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4702 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-18 02:33:45 +00:00
hanna
90711d445c
Change the interface for RMDTrackBuilder, therefore always mandating the specification
...
of a sequence dictionary and related info. This will hopefully eliminate the cases in
which the refseq track depends a sequence dictionary / contig parser that hasn't been
specified.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4700 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 19:00:17 +00:00
fromer
367cc9135f
Use VariantContext and Genotype accessor methods for attributes that will return null for unparseable data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4699 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 18:19:56 +00:00
fromer
2f3578182a
Added VERY preliminary version for merging refseq annotations as SNPs are merged
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4698 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 16:49:12 +00:00
fromer
e2f7f33ce7
Added getIntegerAttribute()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4697 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 16:33:07 +00:00
depristo
d86ab2becb
JEXL expressions now generate exceptions, not warnings. Tools should catch the runtime exception to handle correctly. Removed unncessary complexity from the JEXL contexts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4695 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 16:08:16 +00:00
delangel
539651de30
Initial version of Indel Statistics module for Variant Eval - not for general use yet, needs more verification and more work. Older IndelHistogram module will be obsolete with this new walker. Right now, for each sample (and for all samples), the following are computed:
...
- Number of insertions
- Number of deletions
- Length distribution for indels.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4694 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-17 15:52:01 +00:00
kshakir
01b721ab61
Passing ReviewedStingExceptions through the HMS.
...
Added a @Hidden experimental argument -validate to VariantEval that allows external JEXL assertions that must evaluate to true will throw an exception.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4692 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-16 21:50:42 +00:00
hanna
24ec35deaf
- Reintroduce test dependency so that the tests passing / failing is not
...
dependent on the contents of the integrationtest directory. Will figure
out how to better manage the integrationtest directory at some point in
the future.
- Up the max heap size for tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4691 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-16 19:55:20 +00:00
fromer
62f02bf30a
Minor JAVA visibility updates
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4690 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-16 15:28:58 +00:00
ebanks
f1b0f3bc49
Putting my changes from earlier in the day back in after someone (rhymes with 'Dark') trounced on them with his last commit...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4687 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-16 01:55:50 +00:00
hanna
8ff4e4cb25
Cleanup testng listener configuration.
...
- Add StingTextReporter, which provides a text dump of the errors to the
console. Had to create our own reporter (inheriting from the standard
TestNG TextReporter) to work around a configuration issue with the
TextReporter. In an ideal world, I'd report this on the TestNG mailing
list and help them resolve the issue, but this solution is relatively
robust at the moment and life is too short.
- Added back the failed test listener, which generates the testng-failed.xml
file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4686 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 23:43:14 +00:00
rpoplin
b677080858
Initial checkin of the ValidationGenotyper. Not intended to be used by anybody yet. Only here for archival purposes at this point.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4685 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 22:33:49 +00:00
depristo
ef2f6d90d2
VQSR now operates on LOD scores in the INFO field directly, and doesn't adjust the QUAL field. New format for tranches file uses LOD score. Old file format no longer supported. log10sumlog10() function, a very useful utility in MathUtils. No more ExtendedPileupElement! Robust math calculations in GMM so that no infinities are generated! HaplotypeScore refactored to enable use of filtered context. Not yet enabled... InferredContext getDouble and getInteger arguments now parse values from Strings if necessary
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4684 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 22:19:22 +00:00
hanna
5b83942cee
- Fix DepthOfCoverage so that, when it abuses the ROD system by instantiating a track in onTraversalDone, it also supplies the correct sequence dictionary and parser.
...
- Changed RMDTrackBuilder to use SequenceDictionaryUtils.validateDictionaries for ref <-> ROD sequence dictionary validation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4683 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 20:34:04 +00:00
ebanks
2af508ef83
Better docs, as requested by Matt
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4681 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 18:24:15 +00:00
depristo
62be55376b
no longer useful
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4677 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 17:53:33 +00:00
ebanks
35382468ee
Better error checking/output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4676 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 16:36:34 +00:00
depristo
7a3a464959
Finally, the logic is right
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4674 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 14:02:09 +00:00
depristo
8d66637fc2
Bug fix for VariantsToTable with filtered records
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4673 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 13:49:16 +00:00
depristo
d76b87d6e3
Useful debug file output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4672 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 13:36:52 +00:00
ebanks
28142408ff
Refactoring so that all counting in UGv2 is done on the filtered context. In particular, tests for empty pileups and too many spanning deletions now use the correct counts. Also, -all_bases mode now trumps all; this one is for you, chartl.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4671 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 05:01:12 +00:00
ebanks
c7229abbf7
Get rid of 'meaningless and random values' that prevent Sendu from merging PG lines. I have to admit that he did have a good point there.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4670 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-15 03:35:12 +00:00
kshakir
2fd816ac5f
Updated ordering of integration tests. GVC > VR > AVC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4669 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-14 06:33:28 +00:00
delangel
cb1e8ad43a
Temp bug fix for indel genotyper: if there are two or more variant contexts at a site, just choose the first one containing an indel and genotype that. There might be cases where IGv2 emits 2 indel variant contexts in at the same ref location which made us fail there. A better solution will be to form underlying haplotypes supported by reads and compute likelihoods of that.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4667 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-14 00:21:54 +00:00
depristo
82f9327b5e
Throw the right exception
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4666 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-13 22:18:42 +00:00
depristo
44d0cb6cde
New version of cutting routines for VQSR. Old code removed. Working unit tests. Best practice with testng integration test (everyone look at it). Walker test now allows you to not specify no. input files, if it can infer input counts from MD5s
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4664 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-13 16:19:56 +00:00
kshakir
62a106ca5a
Disabled VariantGaussianMixtureModelUnitTest
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4663 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-13 03:53:33 +00:00
kshakir
673fa841a4
Updated PluginManager so that during testing Queue can dynamically compile and load separately multiple class directories into the same class loader.
...
Removed obsolete usages of PackageUtils with updated PluginManager.
Ported Queue interval utilities written in scala over to Sting's java IntervalUtils.
Added a very basic intergration test to ensure that the fullCallingPipeline.q compiles.
Added options to specify the temporary directories without having to use -Djava.io.tmpdir (useful during the above integration test).
While adding tempDir added options to specify the run directory from the command line, for example "-runDir v1".
Upgraded to scala 2.8.1 and updated calls to deprecated functions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4661 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 20:14:28 +00:00
depristo
42acc968b1
Unit tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4660 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 20:09:39 +00:00
ebanks
b51762c279
When you commit code late at night you tend to make careless mistakes... like forgetting to update integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4658 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 14:41:10 +00:00
depristo
988da428ae
Bug fix for old style tranches file. ApplyVariantCuts moved over, and passes integration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4657 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 14:38:26 +00:00
depristo
c5f8c4dd0d
VariantEval test for tranches file, plus cutting over VE to use the generic Tranches framework
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4656 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 13:52:40 +00:00
ebanks
69de3e51bf
Better precision for the calculated AF value. Now looks at the total number of samples to determine how much precision is necessary. Also, changing default min BQ used for calling in UGv2 to Q17.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4655 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-12 08:31:40 +00:00
depristo
ec83a4b765
Initial commit, without any tool changes, of a new infrastructure for determining tranches. This new version walker up from the lowest quality snps and determines Ti/Tv. This is marginally more stable than moving in the other direction when there are few novel variants (exomes). Can make a substantial difference in the size of the call set (10-20%). I'll hook it into the main system now. Includes an new class Tranche, isolated read/writing utilities that are now testing in TestVariantRecalibrator, which should be moved to UnitTest as soon as I can figure out how to do this on my mac.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4654 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 23:52:49 +00:00
depristo
ed6396ed43
No longer getting the inet, it seems to potentially hang the JVM
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4653 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 23:49:42 +00:00
ebanks
2f6666a988
Correcting traversal statistics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4652 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 22:46:58 +00:00
depristo
dbde721dd0
Bug fix for filtered records
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4651 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 18:54:51 +00:00
aaron
698e5cf345
for GATK style codecs, make sure we fill in their GenomeLocParser from the RMDIndexer
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4650 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 18:44:15 +00:00
delangel
2f3be24a00
Improvement in exact allele frequency calculation model (still under test, but this is definitely better than what I had before). Instead of approximating log(10^x+10^y) as max(x,y), approximate full Jacobian formula max(x,y)+log(1+10^-abs(x-y)) with static lookup table for the second term.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4647 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-11 01:22:35 +00:00
asivache
2e0296fef9
NWayOut logic slightly changed: 1) results.list file is gone; 2) now with -nWayOut one can specify either a) suffix to attach to every output file (i.e. cleaned reads from inputK.bam will be sent to inputK.suffix.bam) or b) *.map tab-separated file that must list <input_name> <output_name> mappings, one per line, for every input file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4645 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-10 20:32:16 +00:00
asivache
a1adfb91ce
And now @Hidden tags are really in place :-/
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4644 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-10 20:28:40 +00:00
asivache
68ce55148e
(pseudo-)genotyping functionality added: force-emits calls (including REF) at specified locations. Currently @Hidden for testing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4643 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-10 20:25:40 +00:00
hanna
8e36a07bea
Convert GenomeLocParser into an instance variable. This change is required
...
for anything that needs to be simultaneously aware of multiple references, eg
Queue's interval sharding code, liftover support, distributed GATK etc.
GenomeLocParser instances must now be used to create/parse GenomeLocs.
GenomeLocParser instances are available in walkers by calling either
-getToolkit().getGenomeLocParser()
or
-refContext.getGenomeLocParser()
This is an intermediate change; GenomeLocParser will eventually be merged
with the reference, but we're not clear exactly how to do that yet. This
will become clearer when contig aliasing is implemented.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4642 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-10 17:59:50 +00:00
depristo
5ef4b234d8
Updates for broken integration tests. Counting annotations (AC, AF) now work correctly for AC = 0 sites
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4640 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-09 19:43:43 +00:00
depristo
4759fdd2ac
V1 of read and variant simulator and assessor. SimulateReadsForVariants generates BAM and VCF with given combinations of variant and read properties. AssessSimulatedPerformance produces a table suitable for analysis in R
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4637 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-08 21:01:33 +00:00
aaron
97db593efb
making my last commit message actually true
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4636 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-07 18:26:23 +00:00
aaron
be499fc986
making the reference optional (the GATK will set it on the first run if it's not included), and setting the seq index if they do supply it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4635 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-07 18:15:31 +00:00
ebanks
e05af54f3e
Found the cause of 80% of our non-called FNs: an excess of filtered bases were causing us to choose the wrong alternate allele. More details to dev team.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4634 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-07 03:39:57 +00:00
aaron
2a8c97a4a7
better error catching, as well as allowing for default index naming, <filename>.idx
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4633 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-06 19:12:19 +00:00
aaron
cb2e26a004
by request, an indexer tool to create Tribble style indexes outside of the GATK
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4632 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-06 18:59:06 +00:00
depristo
bbb890dd6c
Bug fix for variants in VCF header fetching to avoid null pointer when a VariantContext tribble codec doesn't have a header
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4630 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-05 12:43:25 +00:00
ebanks
c9dbd8f80a
Bug fix for Tim: all point events must be treated equally
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4629 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-05 03:42:51 +00:00
rpoplin
913db5d1ab
Unfortunately when annotating sites with the UG the -G None option was wiping out the single annotations added by -A options
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4625 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-04 19:27:23 +00:00
ebanks
816c86776e
Walker description was wrong and it was bothering me
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4624 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-04 02:17:09 +00:00
ebanks
87f6738d4c
Deprecated
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4623 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-04 02:07:40 +00:00
chartl
42e9987e69
Bug fix to GenotypeConcordance. AC metrics get instantiated based on number of eval samples; if Comp has more samples, we can see AC indeces outside the bounds of the array.
...
Bug fix to LiftoverVariants - no barfing at reference sites.
AlleleFrequencyComparison - local changes added to make sure parsing works properly
Added HammingDistance annotation. Mostly useless. But only mostly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4622 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-03 19:23:03 +00:00
fromer
3d27defe93
Fixed output stats (percentage denominator)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4621 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-03 18:47:06 +00:00
ebanks
4e109f58bf
In preparation for Ryan's jumping into SLOD: getting rid of bad hack to ensure P(AF=i) is calculated in the strand-specific cases. With Mark's recent changes this is no longer necessary and just makes the code slower.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4620 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-03 03:44:59 +00:00
fromer
22d64f77ff
Added hidden --outputMultipleBaseCountsFile option to detect cases where a single read has more than one base at the same position
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4619 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-03 03:22:48 +00:00
hanna
8f9bf82aa7
Bamboo is correctly interpreting test fails. Reverting forced-fail test
...
code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4617 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-02 19:32:34 +00:00
hanna
1df166b76e
Forcing a unit test fail to ensure that Bamboo is picking up on failed tests
...
as well as successes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4616 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-02 19:03:12 +00:00
fromer
a885ecf046
When merging MNPs, the phased flag and the phase quality (PQ) are determined simultaneously
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4613 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-02 14:44:26 +00:00
hanna
861ee3e37a
Changing testing framework from junit -> testng, for its enhanced configurability.
...
Initial test to see how Bamboo will respond. More detailed email to follow.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4609 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 21:31:44 +00:00
asivache
fe3f78e1d3
make it full (absolute) path for the file names recorded in results.list
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4608 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 20:53:51 +00:00
asivache
2ac5e55130
typo
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4607 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 20:38:02 +00:00
asivache
0e6dd38936
In n-way-out mode, added printing names of all the output files into 'results.list' file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4606 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 20:37:38 +00:00
fromer
64599d1074
Added debugging message
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4605 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 19:51:42 +00:00
fromer
639ecdc931
Noted in comment that using a single sample in MergePhasedSegregatingAlternateAllelesVCFWriter does NOT update any of the INFO fields, though this could be changed in the future...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4604 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 19:02:52 +00:00
fromer
8439f0aa61
Check for VCFConstants.MISSING_VALUE_v4 when retrieving INFO fields and consider such values as non-existent
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4603 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 17:51:35 +00:00
asivache
aadd230636
N-Way-Out is back. Now uses SAMReadID to identify each read's source bam, so should be reliable. Interface is sort of ugly fo now: to generate output file names, .bam is stripped from input file names, then the value of -nWayOut argument is pasted on (and all the output files are written into the current dir).
...
Unrelated change: in the sorted-target mode (when we read sorted target intervals one by on from a file), one can now specify multiple semicolon-separated interval files (all must be sorted). Not hugely useful probably, but makes --targetIntervals always process its values in exactly the same way, so we are consistent (it has been already taking ;-separated args in unsorted mode)
NwayIntervalMergingIterator: reads in multiple sorted GenomeLoc input streams (iterators) and presents them as a single sorted and merged stream
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4602 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 16:06:51 +00:00
depristo
23cb399a88
Reasonable first pass at a correct SB calculation. Simple utilities to support it. VariantsToTable no longer prints filtered sites by default. New non-standard variant eval module to print comp sites not present in eval (FN finder)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4601 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-31 12:41:52 +00:00
delangel
30fae5cf18
Major redo of exact AF computation for UnifiedGenotyperV2. Fact of life is, there's no way we can compute an exact QUAL field and keep performing the AF computation in linear probability space. In good sites with lots of samples, the ratio of Pr(AC=K*|D) to Pr(AC=0|D) can be 10^1500 or some ridiculous large number like that, which no double can represent. So, we abandon probablity space and work now in log likelihood space, which has several major repercussions:
...
a) Sites were numerically well behaved now, but another hard fact of life is that the AF iteration is defined in linear Pr space, not in log likelihood space, and the math doesn't work out in log space. So, we need to convert back and forth from lin to log space.
b) As a consequence of a), the code got a major slowdown, and calling the 629 samples was about 15 times slower than before (sic).
c) To solve b), log10 of integers are now cached at init, and numerical approximations are now made. Most importantly, I'm using the approximation that log(exp(a) + exp(b)) ~= max(a,b) which seems almost inconsequential in practical performance but reduces computation time to what it was before. More detailes analyses are forthcoming. This approximation can be refined further on to avoid expensive log-exp conversions if further profiling and analysis deems it necessary.
Also, two other issues were solved:
a) Strand bias computation was actually wrong in the case where the optimal AC was bigger than max(forward reads,reverse reads). Now the code is exactly as buggy as the grid search model (all bugs are equal, but some are more equal than others)
b) Genotype likelihoods are now computed in a better way and if a likelihood < 0 we don't just cap to 0 but do something a bit smarter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4600 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-31 01:26:04 +00:00
hanna
d492621122
The TraversalEngine's habit of hanging onto old ROD states seems to have a bad
...
interaction with Tribble. In Tribble, keeping these references in memory until
the shard is flushed means keeping one 512K character buffer per object in
memory. Fixed by purging the reference to the object at the end of the
shard traversal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4599 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-29 17:09:58 +00:00
ebanks
1c056ea791
Users can now use VariantAnnotator to add annotations from one VCF to another. For example, if you want to annotate your target VCF with the AC field value from the rod bound to CEU1kg, you can specify -E CEU1kg.AC and records will be annotated with CEU1kg.AC=N when a record exists in that rod at the given position.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4598 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-29 16:38:31 +00:00
ebanks
1b3fc8ddd2
Doing things too quickly is also naughty. Thanks, Andrey. Now, we're even.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4597 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-29 14:50:04 +00:00
ebanks
58f7b4c595
Naughty use of assertions means that malformed records are not caught.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4596 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-29 14:41:38 +00:00
delangel
9a60e72364
Trivial change to LeftAlignVariants: make walker return number of aligned variants on map(), and print out the # of aligned variants at the end of the traversal.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4595 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-29 02:03:36 +00:00
hanna
2f8057bf24
Cleanup for multithreading memory leak during integration tests...unregister MXBean at end
...
of traversal to avoid holding a reference to the microscheduler, which holds a reference to
the engine, which in turn holds a reference to the walker, which itself holds a reference to
all the data aggregated during the course of the traversal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4594 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-28 18:37:42 +00:00
depristo
860de05a7c
Bug fix for PL vs. GL in header. PL now truly default output for UGv2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4592 348d0f76-0448-11de-a6fe-93d51630548a
2010-10-28 12:39:18 +00:00