delangel
e6e8a20a1e
1) Fix MyHaplotypeScore to ignore 454 reads, since all those pathological non-existing indels make some sites' score blow up. If a site is only covered by 454 reads, we (hopefully) detect this graciously and just emit a score of 0.0 for the site.
...
2) New annotation SByDepth = log10(-StrandBias/Depth) (non-standard annotation, key name = "SBD"). If StrandBias/Depth happens to be positive (very rare but can happen), annotation gets value=-1000.
3) Abstracted out new class AnnotationByDepth so that QD and SBD can share code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3930 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 15:23:08 +00:00
ebanks
bf60ed0b25
Needed it here too: warn user instead of dying if the R script cannot be executed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3929 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 13:11:27 +00:00
ebanks
40ffe34686
Warn user instead of dying if the R script cannot be executed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3928 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-04 13:08:15 +00:00
ebanks
17d5e89734
Now --list annotates which modules are Standard
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3927 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 21:00:37 +00:00
ebanks
72875cf717
Removing annoying printouts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3926 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 19:55:00 +00:00
ebanks
2307bed742
VariantEval now uses the "standard" modules only by default. You can add other modules with the -E argument and not use all of the standard ones with -noStandard (they can be added back individually with -E).
...
Generalized some of the packaging code from VariantAnnotator. Matt might want to take a look to make this nicer...?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3925 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 16:51:10 +00:00
ebanks
a7ff9caf54
Added sanity check against bad people and/or crazy big indels at edges of ref context
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3918 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 05:37:17 +00:00
hanna
5f1b67c1de
Coping out and forcing the entire GATK (and associated JVM) to use US English
...
locale. Method to force JVM into proper locale exists in CommandLineProgram
and is disabled by default, but implementers of CommandLineProgram can opt in
to the forced US locale by calling a static method.
Question for the VCF developers: I removed the code to explicitly output doubles
in US locale. Do you / how do you want to handle this in applications that use
Tribble outside the GATK?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3917 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-03 03:48:26 +00:00
chartl
2bc69572cb
Make transcript2info capable of handling b37/hg19 contigs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3915 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-02 17:32:08 +00:00
depristo
c203e0fb02
Added JEXL support for hetCount, homRefCount, and homVarCount in VCs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3914 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-02 12:24:11 +00:00
depristo
7fab5c0a8f
support for -singleton_fp_rate arguments to variant recalibrator instead of the pop.gen. AF prior. Worth experimenting with Ryan.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3913 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-31 21:17:47 +00:00
ebanks
6d91cd587e
Be explicitly clear about which options are for debugging purposes only and shouldn't be used if your username is not ebanks@broad. If only we had a @hidden annotation option for args...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3909 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 14:18:31 +00:00
depristo
ac8048f17b
Support for automated selects for tranches in variant eval -- use -tf to make tranch-specific ve outputs. ApplyVariantCuts with tranche reading functions for general use, along with todo for ryan. CombineVariants now has --filteredAreUncalled and will treat filtered snps in input VCFs are uncalled, and so won't emit -filteredInOther set features
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3908 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 14:16:43 +00:00
chartl
9231d13252
Minor modification: adding an argument to make slightly more general.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3907 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 05:20:20 +00:00
chartl
db54d63fc7
Hahaha yes, ownage. This now works.
...
BTW, Eric, thanks for forwarding the DepthOfCoverage thread to gsamembers. I'd forgotten about reduce by interval. Mighty helpful in this case!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3906 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 04:23:02 +00:00
chartl
3e3f8c7692
Simple count intervals walker, as per my recent email to GSAMembers. Never use this. It doesn't behave the way you think it does.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3905 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-30 03:39:23 +00:00
delangel
ba1a330293
Corrected location and made more explicit the error message thrown if someone tries to read a VCF 3.3 file with indels, which is not supported.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3901 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 20:02:47 +00:00
delangel
5af986e0c1
Add an integration test for Beagle (one for ProduceBeagleInput and one for BeagleOutputToVCFWalker)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3897 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-29 18:49:22 +00:00
delangel
e1a34685fd
Add back MyHaplotypeScore as a new implementation for HaplotypeScore, this time as a non-standard annotation. Implementaiton is also better, it computes better consensus haplotypes, ranks them by sum of quality score.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3890 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 21:23:19 +00:00
hanna
6c93b13428
A Java sizeof, implemented using the Java instrumentation API. Can either get the memory consumed either only by a single
...
object or by a single object and all the references it contains. Requires a command-line change to add a Java agent to
the command-line; see the Sizeof.java javadoc for details.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3889 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 18:44:15 +00:00
rpoplin
f5566a6593
Knocking out some quick findBugs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3887 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 14:10:59 +00:00
delangel
894623858d
OK, bad idea to add new temporary annotation - revert to keep integration tests hapy.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3886 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 12:07:13 +00:00
delangel
71bfb1ee35
First redesign of HaplotypeScore - now, a different approach is taken to build possible haplotypes at a site: first, all possible haplotypes consistent with reads are formed (reference is not used). After this list has been formed, it is ranked according to the number of reads that are consistent with it and the two most popular haplotypes are chosen.
...
this reduces to the old method in typical cases, but it builds haplotypes correctly if there are two variants close by within a context window.
Annotation is temporarily named MyHaplotypeScore so it can be run in parallel with old one, soon it will be renamed after some more testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3885 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 10:54:56 +00:00
delangel
cffebcc867
Small utility walker used for production of the Beagle data processing paper section. Walker will print out to output file, for every site common to a reference vcf and an eval vcf, a given sample's depth, hapmap AC and AF and pre/post Beagle genotype as well as corresponding reference (e.g. Hapmap) genotype.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3884 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 03:00:17 +00:00
ebanks
1d9ed1e214
Cleanup of old VCFRecord code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3883 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 02:56:47 +00:00
ebanks
7dd55fbf13
Archiving
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3882 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-27 02:47:18 +00:00
aaron
9667942e52
fix for Ryan's issue: we also need to sync when we store a resource.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3881 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-26 22:17:47 +00:00
hanna
8b072b59e2
Returning index dumping functionality in BAMFileStat to a useable state.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3880 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-26 20:03:50 +00:00
depristo
19ad44d332
Minor improvements to CombineVariants to handle the complex case from Chris. IntegrationTest of complex case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3876 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-25 13:46:11 +00:00
ebanks
7c5a3836db
Trivial changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3875 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-25 04:00:47 +00:00
ebanks
56de475f11
Based on feedback from non-GSA users, who claim that our exceptions are 'scary and overwhelming,' I've cleaned up the error message to first describe the error and what users should do and then ask them to copy the subsequent stack trace into their GetSatisfaction posting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3874 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-25 03:57:44 +00:00
ebanks
9bd8a2685b
Because the performance tests were busted on LSF, no one caught this error until now: when Matt changed over the contract for the AlignmentContext, this line needed to get updated too. All is well now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3873 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-25 02:53:01 +00:00
depristo
b551eaf8fd
Actually commit the code that makes variant eval run in a reasonable amount of time.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3872 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-24 17:32:03 +00:00
depristo
b0b37c3476
No handles (I believe) reference only VCs correctly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3871 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 23:09:23 +00:00
depristo
e21376219d
Updates to CombineVariants for Tim. -setKey can be null. Integrationtests for -setKey foo and -setKey null.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3870 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 22:35:52 +00:00
delangel
26bb1cd9ce
Fix broken test correctly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3869 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 20:47:41 +00:00
delangel
4fc1db7aaf
Change interface to VCFWriter add() method to take only 1 byte from reference (since that's the only thing it needs), to prevent bugs like having people call it with ref.addBases() which is wrong (since it provides bases starting from the left of reference context window).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3868 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 20:24:03 +00:00
aaron
b3fd145161
fix for a bug deep in the tribble indexing: if you had a single record in the first contig, the second contig's index blocks would point to the wrong file seek location, and you'd see no
...
features in that contig. Thanks to Mark for finding this. I'm not rev'ing the index version (which would cause all indexes to be rebuilt), since this seems like a pretty rare edge case.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3865 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 18:39:55 +00:00
depristo
33090629ea
VariantEval can now see the EvaluationContext group objects, so they can decide if/when to print interesting sites. GenotypeConcordance has a hard-coded option to print FNs that is on the way to being generally useful. VCFWriter now uses the US locale for formatting floating point numbers; I believe this fixes a long-standing annoyance. Italian guys will check on this.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3864 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 17:16:50 +00:00
delangel
5eef15cfdf
a) Bad bug fix to CombineVariants: when indels were being merged, the reference base provided was wrong - ref.getBases()[0] was being used, but this returns bease at start of window. Instead, the reference at current locus should be used.
...
b) Cosmetic change to Beagle annotation description.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3861 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 15:13:47 +00:00
ebanks
4ff8b8fc0e
1. Fixing a bug that Mark found where indel-containing clipped reads would get an original cigar tag even when they didn't actually get modified.
...
2. Added some useful logging messages.
3. Added a oneoffs walker to calculate the number of realigned reads and intervals containing them.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3860 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 14:24:01 +00:00
chartl
973934f769
Depth of coverage now uses longs rather than ints. We can now successfully run on the Lepidosiren paradoxa genome. (about 80 GB)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3859 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 14:14:12 +00:00
depristo
536399eaa0
Improvements to variant combine. Now calculates AC/AN/AF correctly by calling into the VariantAnnotator engine. Automatically removes annotations that are inconsistent across incoming VCs (in simpleMerge). TODO bug fix for Guillermo/Eric.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3858 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 13:33:11 +00:00
aaron
9579aace1f
updates to code dependent on Tribble, as well as the following Tribble changes:
...
- makes writing to disk optional for indexes using the indexCreator classes (allow the user to specify the index file, if null don't write it)
- removed some system.out debugging code
- fixed version checking in interval tree
- made indexes store and return a LinkedHashSet for sequence names (to ensure they've preserved the ordering in the file)
- index creators now read the file before creating the index
- changed the Index.write() method to take a LEDataStream instead of a file
- removed the sequence dictionary code on the header
- added utils for getting LEDataStreams
- added a base Tribble exception
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3857 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 01:56:10 +00:00
ebanks
c5325b03be
1) Removed hard-coded strings. Please let's use the fields defined in VCFConstants.
...
2) General code cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3856 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 01:49:47 +00:00
hanna
e9d243babb
More improvements to exception handling during multithreaded runs based on
...
a bug reported by Ryan.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3855 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 22:13:01 +00:00
hanna
83798225ac
Repackaged datasource-specific command-line tools into their own package. Added a tag renamer tool.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3854 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 19:50:34 +00:00
delangel
98caedb5f0
Forgot to update VCF4 unit test.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3853 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 16:25:51 +00:00
asivache
485023ba8e
this.intersect(that) method added to GenomeLoc (returns intersection of two intervals or dies if the locations do not overlap)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3852 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 16:00:30 +00:00
asivache
3308d956f4
Added utility shortcut method: getOriginalQualsInCycleOrder(read)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3851 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 15:44:25 +00:00