ebanks
77cace3aff
It's probably a good idea to look carefully at what you've done before actually committing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3863 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 17:06:54 +00:00
ebanks
44f9a631d6
Adding CombineVariants to release. Pining for the days when all of core/playground will be part of release by default...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3862 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 16:53:18 +00:00
delangel
5eef15cfdf
a) Bad bug fix to CombineVariants: when indels were being merged, the reference base provided was wrong - ref.getBases()[0] was being used, but this returns bease at start of window. Instead, the reference at current locus should be used.
...
b) Cosmetic change to Beagle annotation description.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3861 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 15:13:47 +00:00
ebanks
4ff8b8fc0e
1. Fixing a bug that Mark found where indel-containing clipped reads would get an original cigar tag even when they didn't actually get modified.
...
2. Added some useful logging messages.
3. Added a oneoffs walker to calculate the number of realigned reads and intervals containing them.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3860 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 14:24:01 +00:00
chartl
973934f769
Depth of coverage now uses longs rather than ints. We can now successfully run on the Lepidosiren paradoxa genome. (about 80 GB)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3859 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 14:14:12 +00:00
depristo
536399eaa0
Improvements to variant combine. Now calculates AC/AN/AF correctly by calling into the VariantAnnotator engine. Automatically removes annotations that are inconsistent across incoming VCs (in simpleMerge). TODO bug fix for Guillermo/Eric.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3858 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 13:33:11 +00:00
aaron
9579aace1f
updates to code dependent on Tribble, as well as the following Tribble changes:
...
- makes writing to disk optional for indexes using the indexCreator classes (allow the user to specify the index file, if null don't write it)
- removed some system.out debugging code
- fixed version checking in interval tree
- made indexes store and return a LinkedHashSet for sequence names (to ensure they've preserved the ordering in the file)
- index creators now read the file before creating the index
- changed the Index.write() method to take a LEDataStream instead of a file
- removed the sequence dictionary code on the header
- added utils for getting LEDataStreams
- added a base Tribble exception
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3857 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 01:56:10 +00:00
ebanks
c5325b03be
1) Removed hard-coded strings. Please let's use the fields defined in VCFConstants.
...
2) General code cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3856 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-23 01:49:47 +00:00
hanna
e9d243babb
More improvements to exception handling during multithreaded runs based on
...
a bug reported by Ryan.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3855 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 22:13:01 +00:00
hanna
83798225ac
Repackaged datasource-specific command-line tools into their own package. Added a tag renamer tool.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3854 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 19:50:34 +00:00
delangel
98caedb5f0
Forgot to update VCF4 unit test.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3853 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 16:25:51 +00:00
asivache
485023ba8e
this.intersect(that) method added to GenomeLoc (returns intersection of two intervals or dies if the locations do not overlap)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3852 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 16:00:30 +00:00
asivache
3308d956f4
Added utility shortcut method: getOriginalQualsInCycleOrder(read)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3851 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 15:44:25 +00:00
delangel
473ec91633
a) Bug fix in VCFHeader parsing - Info fields were not being parsed properly, with the result that the Count field was not being properly displayed in records (e.g. if Count=0 for a particular field, the INFO tag was still being displayed as ...;Field=x;... instead of ...;Field;...
...
b) Bug fixes and update to how we represent indels and other complex events in a VariantContext object. Convention is now that all events are left aligned, with the first variant context location marking the common base before an event occurs. However, alleles in a VC don't have the common base in all VC's. Two new functions are now part of VariantContextUtils: CreateVariantContextWithPaddedAlleles and CreateVariantContextWithTrimmedAlleles. Both take a VC as an input and create a VC as an output.
Main flow is that a VCF reader would create a VC with trimmed alleles, all walkers would ideally work with these trimmed alleles, and then the VCF writer would pad back the alleles before writing. However, there are special cases where we need to pad alleles like for example when merging/combining VC's.
Pending issues:
- PED and DBSNP RODs have to be updated to create VC's for indels following the convention above. Changes will go in after Tribble location is moved and things are tested.
- Need to verify Indel genotyper and other modules that create VC's with indels.- Wiki page describing convention above and how walkers should interpret indel VC's still needs updating/detailing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3850 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-22 02:36:45 +00:00
chartl
b696c3ea98
No more traversal reduce results.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3849 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-21 18:34:54 +00:00
chartl
365b42390d
Support for generating (very basic) wiggle files for use with IGV (see UCSC for wiggle spec); and a walker to take in a variant track and create a transition transversion rate track for the whole genome (due to the wiggle spec, this has to be done by chromosome). It's interesting to see the effect of genes!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3848 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-21 18:04:30 +00:00
corin
3596b1529f
module for listing out samples for data processing and firehose reporting
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3847 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-21 15:05:41 +00:00
corin
1209b165bf
Now accepts command line args and prints paths to vcf, bams and beds
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3846 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-21 15:04:58 +00:00
depristo
f7957bc7f2
Fixed memory leak in VariantEval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3845 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-21 12:35:46 +00:00
aaron
cebddb5bd3
removing said file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3844 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-21 03:29:48 +00:00
aaron
25d26ab6ee
a little experiment for better integration with tribble; using external definition to pull other repositories into ours seamlessly. This will be deleted in a few minutes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3843 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-21 03:27:36 +00:00
aaron
1cba81c16f
updates to tribble with fixes for some bugs I've found in some new indexing code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3842 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 22:08:04 +00:00
ebanks
ff6748d1cd
oops - missed one
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3841 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 18:55:19 +00:00
ebanks
c6ad26e04f
1) When quals/GQs are really integers (x.00), strip off the floating points.
...
2) Keep track of whether vcf records are unfiltered vs. pass filters in the variant context so we can regenerate the records on output.
3) No more "ID" hard-coded all over the code to set the VariantContext ID. Use a static variable instead.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3840 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 18:01:45 +00:00
ebanks
0db7fab1a9
Fixing genotype filtering for VF and adding integration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3839 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 07:30:21 +00:00
aaron
2a6c2d3098
re-enable test; I was moving the input file in prep for my last commit around on Eric, so he rightfully removed the test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3838 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 07:14:59 +00:00
aaron
0108517b98
updating the Tribble track loading code to use the new shared locks, updated lots of new tests, add infrastructure for the TreeInterval, and removed the old locking class.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3837 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 07:08:10 +00:00
aaron
af6b5f000e
updating the Tribble library; added writing of indexes to the index interface for working with the tree index.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3836 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 07:02:08 +00:00
ebanks
f742980864
1. Refactoring of GenoypeWriters so that parallelization now works again with VCF4.0. We now have just a single reference to the old VCF classes, and that one will be purged soon.
...
2. Moved Jared's VCFTool code into archive so that everything would compile.
3. Added the vcf reference base (needed for indels) as an attribute to the VariantContext from the reader.
4. TribbleRMDTrackBuilderUnitTest was complaining that a validation file didn'r exist, so I commented it out.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3835 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-20 06:16:45 +00:00
depristo
70b07206a2
CombineVariants tests for Guillermo and Eric to explore the correctness of the in/out reader, writer behavior of the system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3834 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 22:41:48 +00:00
depristo
c47a5ff5ab
Official parallel CountCovariates, passes all integration tests. Now poster-child example of parallelism in GATK (Matt H). Apparent general performance improvements throughout too.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3833 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 22:13:18 +00:00
rpoplin
0b56003d1a
Remove stray commented out line
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3832 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 19:14:39 +00:00
rpoplin
8e31c01680
Solid processing in base quality recalibrator now has several options for how to handle no calls in the color space. --ignore_nocall_colorspace is removed and replace by --solid_nocall_strategy. Fixed some of the @Deprecated tags in BaseUtils. LocusWalkers now filter out FailsVendorQualityCheck reads. HLA caller integration test bam file had bad vendor reads so its integration test changed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3831 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 19:10:29 +00:00
aaron
18b0114e25
remove FixBAMSortOrder walker.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3830 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 17:27:23 +00:00
aaron
35ce367898
adding the annotations for findbugs as dependencies in the GATK. They have to be in the default config so that we can
...
annotate code without running findbugs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3829 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 16:34:57 +00:00
kiran
b990a22bac
A very nice way of automatically plotting the results of a VariantEval run. All of the hard work is actually in the common R repository, gsacommons.R, including methods for creating a Venn diagram. It also provides a mechanism for the output of a VariantEval run to be loaded into a single list object.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3828 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 12:38:26 +00:00
aaron
250ab70fed
update the Tribble library too.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3827 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 05:00:37 +00:00
aaron
f4cfb0f990
The first step in integrating Jim's tree based index scheme:
...
- changed to a better method for getting headers from Codecs
- some removal of old commented out code in the GATKAgrumentCollection
- changes for the rename of FeatureReader to FeatureSource
- removed the old Beagle ROD
- cleaned up some of the code in SampleUtils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3826 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 04:49:27 +00:00
hanna
40a963541d
Uniquify the registered MXBean by adding an instanceNumber=... tag to the
...
ObjectName. In the Queue-enabled future, we might want to come up with GUIDs
(or at least semi-unique IDs) so that we could use JMX to track runtime
attributes for multiple jobs running simultaneously.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3825 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 00:58:54 +00:00
ebanks
5a1a3fc79a
Fix bad VariantContext creation in unit test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3824 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 20:21:01 +00:00
depristo
7c42e6994f
FindBugs fixes throughout the code base
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3823 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 16:29:59 +00:00
ebanks
693672a461
Refactoring the VCF writer code; now no longer uses VCFRecord or any of its related classes, instead writing directly to the writer. Integration tests pass, but some are actually broken and will be fixed this week.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3822 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 13:19:56 +00:00
ebanks
379584f1bf
Re-enable (most of) these tests. Guillermo will re-enable the other one when the VCF->VC conversion is done for indels
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3821 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 03:24:28 +00:00
ebanks
982947d328
update to deal with partial indels (I/D with no bases) in the HM records
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3820 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 02:56:37 +00:00
depristo
25a27b78bc
1KG Table 1 counting pipeline. Useful example
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3819 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-17 22:30:56 +00:00
depristo
414ec6f20a
Removing version argument constructors that shouldn't be used. Temporary allow -- with global variant to indicate this should be removed -- header records without description fields. Real error checking in the headers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3818 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-17 22:30:08 +00:00
depristo
14b21e487b
always 4.0
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3817 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-17 22:28:48 +00:00
depristo
d40299840c
indenting clean up
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3816 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-17 22:28:28 +00:00
hanna
9207c58b8f
A fix for the integration test I broke on Friday on my way out the door --
...
some workflows using AlignmentContext were working with it in a way I didn't
expect and wound up treating extended pileups as base pileups. I'll work to
make sure the AlignmentContext interface is crystal clear.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3815 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-17 22:22:44 +00:00
chartl
ea117957b9
Add CountFunctionalVariants to local release (for firehose)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3814 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-17 01:36:16 +00:00