gatk-3.8

Commit Graph

Author	SHA1	Message	Date
delangel	e1a34685fd	Add back MyHaplotypeScore as a new implementation for HaplotypeScore, this time as a non-standard annotation. Implementaiton is also better, it computes better consensus haplotypes, ranks them by sum of quality score. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3890 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-27 21:23:19 +00:00
hanna	6c93b13428	A Java sizeof, implemented using the Java instrumentation API. Can either get the memory consumed either only by a single object or by a single object and all the references it contains. Requires a command-line change to add a Java agent to the command-line; see the Sizeof.java javadoc for details. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3889 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-27 18:44:15 +00:00
rpoplin	f5566a6593	Knocking out some quick findBugs. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3887 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-27 14:10:59 +00:00
delangel	894623858d	OK, bad idea to add new temporary annotation - revert to keep integration tests hapy. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3886 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-27 12:07:13 +00:00
delangel	71bfb1ee35	First redesign of HaplotypeScore - now, a different approach is taken to build possible haplotypes at a site: first, all possible haplotypes consistent with reads are formed (reference is not used). After this list has been formed, it is ranked according to the number of reads that are consistent with it and the two most popular haplotypes are chosen. this reduces to the old method in typical cases, but it builds haplotypes correctly if there are two variants close by within a context window. Annotation is temporarily named MyHaplotypeScore so it can be run in parallel with old one, soon it will be renamed after some more testing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3885 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-27 10:54:56 +00:00
delangel	cffebcc867	Small utility walker used for production of the Beagle data processing paper section. Walker will print out to output file, for every site common to a reference vcf and an eval vcf, a given sample's depth, hapmap AC and AF and pre/post Beagle genotype as well as corresponding reference (e.g. Hapmap) genotype. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3884 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-27 03:00:17 +00:00
ebanks	1d9ed1e214	Cleanup of old VCFRecord code git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3883 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-27 02:56:47 +00:00
ebanks	7dd55fbf13	Archiving git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3882 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-27 02:47:18 +00:00
aaron	9667942e52	fix for Ryan's issue: we also need to sync when we store a resource. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3881 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-26 22:17:47 +00:00
hanna	8b072b59e2	Returning index dumping functionality in BAMFileStat to a useable state. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3880 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-26 20:03:50 +00:00
depristo	19ad44d332	Minor improvements to CombineVariants to handle the complex case from Chris. IntegrationTest of complex case. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3876 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-25 13:46:11 +00:00
ebanks	7c5a3836db	Trivial changes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3875 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-25 04:00:47 +00:00
ebanks	56de475f11	Based on feedback from non-GSA users, who claim that our exceptions are 'scary and overwhelming,' I've cleaned up the error message to first describe the error and what users should do and then ask them to copy the subsequent stack trace into their GetSatisfaction posting. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3874 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-25 03:57:44 +00:00
ebanks	9bd8a2685b	Because the performance tests were busted on LSF, no one caught this error until now: when Matt changed over the contract for the AlignmentContext, this line needed to get updated too. All is well now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3873 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-25 02:53:01 +00:00
depristo	b551eaf8fd	Actually commit the code that makes variant eval run in a reasonable amount of time. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3872 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-24 17:32:03 +00:00
depristo	b0b37c3476	No handles (I believe) reference only VCs correctly git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3871 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 23:09:23 +00:00
depristo	e21376219d	Updates to CombineVariants for Tim. -setKey can be null. Integrationtests for -setKey foo and -setKey null. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3870 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 22:35:52 +00:00
delangel	26bb1cd9ce	Fix broken test correctly git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3869 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 20:47:41 +00:00
delangel	4fc1db7aaf	Change interface to VCFWriter add() method to take only 1 byte from reference (since that's the only thing it needs), to prevent bugs like having people call it with ref.addBases() which is wrong (since it provides bases starting from the left of reference context window). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3868 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 20:24:03 +00:00
aaron	b3fd145161	fix for a bug deep in the tribble indexing: if you had a single record in the first contig, the second contig's index blocks would point to the wrong file seek location, and you'd see no features in that contig. Thanks to Mark for finding this. I'm not rev'ing the index version (which would cause all indexes to be rebuilt), since this seems like a pretty rare edge case. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3865 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 18:39:55 +00:00
depristo	33090629ea	VariantEval can now see the EvaluationContext group objects, so they can decide if/when to print interesting sites. GenotypeConcordance has a hard-coded option to print FNs that is on the way to being generally useful. VCFWriter now uses the US locale for formatting floating point numbers; I believe this fixes a long-standing annoyance. Italian guys will check on this. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3864 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 17:16:50 +00:00
delangel	5eef15cfdf	a) Bad bug fix to CombineVariants: when indels were being merged, the reference base provided was wrong - ref.getBases()[0] was being used, but this returns bease at start of window. Instead, the reference at current locus should be used. b) Cosmetic change to Beagle annotation description. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3861 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 15:13:47 +00:00
ebanks	4ff8b8fc0e	1. Fixing a bug that Mark found where indel-containing clipped reads would get an original cigar tag even when they didn't actually get modified. 2. Added some useful logging messages. 3. Added a oneoffs walker to calculate the number of realigned reads and intervals containing them. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3860 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 14:24:01 +00:00
chartl	973934f769	Depth of coverage now uses longs rather than ints. We can now successfully run on the Lepidosiren paradoxa genome. (about 80 GB) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3859 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 14:14:12 +00:00
depristo	536399eaa0	Improvements to variant combine. Now calculates AC/AN/AF correctly by calling into the VariantAnnotator engine. Automatically removes annotations that are inconsistent across incoming VCs (in simpleMerge). TODO bug fix for Guillermo/Eric. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3858 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 13:33:11 +00:00
aaron	9579aace1f	updates to code dependent on Tribble, as well as the following Tribble changes: - makes writing to disk optional for indexes using the indexCreator classes (allow the user to specify the index file, if null don't write it) - removed some system.out debugging code - fixed version checking in interval tree - made indexes store and return a LinkedHashSet for sequence names (to ensure they've preserved the ordering in the file) - index creators now read the file before creating the index - changed the Index.write() method to take a LEDataStream instead of a file - removed the sequence dictionary code on the header - added utils for getting LEDataStreams - added a base Tribble exception git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3857 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 01:56:10 +00:00
ebanks	c5325b03be	1) Removed hard-coded strings. Please let's use the fields defined in VCFConstants. 2) General code cleanup. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3856 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-23 01:49:47 +00:00
hanna	e9d243babb	More improvements to exception handling during multithreaded runs based on a bug reported by Ryan. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3855 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-22 22:13:01 +00:00
hanna	83798225ac	Repackaged datasource-specific command-line tools into their own package. Added a tag renamer tool. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3854 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-22 19:50:34 +00:00
delangel	98caedb5f0	Forgot to update VCF4 unit test. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3853 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-22 16:25:51 +00:00
asivache	485023ba8e	this.intersect(that) method added to GenomeLoc (returns intersection of two intervals or dies if the locations do not overlap) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3852 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-22 16:00:30 +00:00
asivache	3308d956f4	Added utility shortcut method: getOriginalQualsInCycleOrder(read) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3851 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-22 15:44:25 +00:00
delangel	473ec91633	a) Bug fix in VCFHeader parsing - Info fields were not being parsed properly, with the result that the Count field was not being properly displayed in records (e.g. if Count=0 for a particular field, the INFO tag was still being displayed as ...;Field=x;... instead of ...;Field;... b) Bug fixes and update to how we represent indels and other complex events in a VariantContext object. Convention is now that all events are left aligned, with the first variant context location marking the common base before an event occurs. However, alleles in a VC don't have the common base in all VC's. Two new functions are now part of VariantContextUtils: CreateVariantContextWithPaddedAlleles and CreateVariantContextWithTrimmedAlleles. Both take a VC as an input and create a VC as an output. Main flow is that a VCF reader would create a VC with trimmed alleles, all walkers would ideally work with these trimmed alleles, and then the VCF writer would pad back the alleles before writing. However, there are special cases where we need to pad alleles like for example when merging/combining VC's. Pending issues: - PED and DBSNP RODs have to be updated to create VC's for indels following the convention above. Changes will go in after Tribble location is moved and things are tested. - Need to verify Indel genotyper and other modules that create VC's with indels.- Wiki page describing convention above and how walkers should interpret indel VC's still needs updating/detailing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3850 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-22 02:36:45 +00:00
chartl	b696c3ea98	No more traversal reduce results. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3849 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-21 18:34:54 +00:00
chartl	365b42390d	Support for generating (very basic) wiggle files for use with IGV (see UCSC for wiggle spec); and a walker to take in a variant track and create a transition transversion rate track for the whole genome (due to the wiggle spec, this has to be done by chromosome). It's interesting to see the effect of genes! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3848 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-21 18:04:30 +00:00
depristo	f7957bc7f2	Fixed memory leak in VariantEval git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3845 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-21 12:35:46 +00:00
aaron	1cba81c16f	updates to tribble with fixes for some bugs I've found in some new indexing code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3842 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-20 22:08:04 +00:00
ebanks	ff6748d1cd	oops - missed one git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3841 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-20 18:55:19 +00:00
ebanks	c6ad26e04f	1) When quals/GQs are really integers (x.00), strip off the floating points. 2) Keep track of whether vcf records are unfiltered vs. pass filters in the variant context so we can regenerate the records on output. 3) No more "ID" hard-coded all over the code to set the VariantContext ID. Use a static variable instead. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3840 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-20 18:01:45 +00:00
ebanks	0db7fab1a9	Fixing genotype filtering for VF and adding integration tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3839 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-20 07:30:21 +00:00
aaron	2a6c2d3098	re-enable test; I was moving the input file in prep for my last commit around on Eric, so he rightfully removed the test git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3838 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-20 07:14:59 +00:00
aaron	0108517b98	updating the Tribble track loading code to use the new shared locks, updated lots of new tests, add infrastructure for the TreeInterval, and removed the old locking class. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3837 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-20 07:08:10 +00:00
ebanks	f742980864	1. Refactoring of GenoypeWriters so that parallelization now works again with VCF4.0. We now have just a single reference to the old VCF classes, and that one will be purged soon. 2. Moved Jared's VCFTool code into archive so that everything would compile. 3. Added the vcf reference base (needed for indels) as an attribute to the VariantContext from the reader. 4. TribbleRMDTrackBuilderUnitTest was complaining that a validation file didn'r exist, so I commented it out. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3835 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-20 06:16:45 +00:00
depristo	70b07206a2	CombineVariants tests for Guillermo and Eric to explore the correctness of the in/out reader, writer behavior of the system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3834 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-19 22:41:48 +00:00
depristo	c47a5ff5ab	Official parallel CountCovariates, passes all integration tests. Now poster-child example of parallelism in GATK (Matt H). Apparent general performance improvements throughout too. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3833 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-19 22:13:18 +00:00
rpoplin	0b56003d1a	Remove stray commented out line git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3832 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-19 19:14:39 +00:00
rpoplin	8e31c01680	Solid processing in base quality recalibrator now has several options for how to handle no calls in the color space. --ignore_nocall_colorspace is removed and replace by --solid_nocall_strategy. Fixed some of the @Deprecated tags in BaseUtils. LocusWalkers now filter out FailsVendorQualityCheck reads. HLA caller integration test bam file had bad vendor reads so its integration test changed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3831 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-19 19:10:29 +00:00
aaron	18b0114e25	remove FixBAMSortOrder walker. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3830 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-19 17:27:23 +00:00
aaron	f4cfb0f990	The first step in integrating Jim's tree based index scheme: - changed to a better method for getting headers from Codecs - some removal of old commented out code in the GATKAgrumentCollection - changes for the rename of FeatureReader to FeatureSource - removed the old Beagle ROD - cleaned up some of the code in SampleUtils git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3826 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-19 04:49:27 +00:00
hanna	40a963541d	Uniquify the registered MXBean by adding an instanceNumber=... tag to the ObjectName. In the Queue-enabled future, we might want to come up with GUIDs (or at least semi-unique IDs) so that we could use JMX to track runtime attributes for multiple jobs running simultaneously. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3825 348d0f76-0448-11de-a6fe-93d51630548a	2010-07-19 00:58:54 +00:00

1 2 3 4 5 ...

3255 Commits (52f24c86fad2ef4a3dd4b893f6d3f7dee72d10f4)