hanna
c9d5345150
Redo StratifiedAlignmentContext to use ReadBackedPileup's stratification options.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3699 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 02:46:05 +00:00
delangel
dc4715c9c6
Permit empty fields in INFO and FORMAT structures - not fully tested yet but at least failing cases before now pass. Also, corrected a bug where in case we were reading 3.3 VCF's, or VCFs with no original allele encodings, we'd always print 2 bases per allele.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3698 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 01:56:07 +00:00
depristo
5f2b2d860e
Final stage of renaming
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3696 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 21:39:07 +00:00
depristo
6e7927a47d
Continuing the renaming nightmare...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3695 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 20:25:01 +00:00
depristo
9d7d5f1747
Continuing the renaming nightmare...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3694 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 20:24:27 +00:00
depristo
aa20c52b88
deleting vcf
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3693 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 20:19:15 +00:00
depristo
4195fc5c4e
renaming part 2...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3692 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 20:18:11 +00:00
depristo
6c9da5525d
renaming starting
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3691 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 20:16:51 +00:00
depristo
b8d6a95e7a
Preliminary commit of new VCFCombine, soon to be called CombineVariants (next commit) that support merging any number of VCF files via a general VC merge routine that support prioritization and merging of samples! It's now possible to merge the pilot1/2/3 call sets into a single (monster) VCF taking genotypes from pilot2, then pilot3, then pilot1 as needed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3690 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 20:13:03 +00:00
kshakir
178cf64a0c
Refactored ArgumentDefinition to absorb functionality from ArgumentDefinition and ArgumentTypeDescriptor.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3688 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 18:37:58 +00:00
chartl
569456850d
Mark pointed out there's differentiation in the filter field. Rolling back.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3687 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 17:05:53 +00:00
chartl
52a474b27d
Fixed an issue with VCF combine in sites like the following:
...
Broad: Filtered BC: No call
These were being treated the same as
Broad: Call BC: No call
Added some verbosity to separate them.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3686 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 16:49:31 +00:00
ebanks
944dbb94ce
Refactored and generalized the database/comp annotations in VariantAnnotator. Now one can provide comp tracks as with VariantEval (e.g. compHapMap, comp1KG_CEU) and the INFO field will be annotated with the track name (without the 'comp') if the variant record overlaps a comp site (e.g. ...;1KG_CEU;...). This means that you can now pass 1kg calls to the Unified Genotyper and automatically have records annotated with their presence in 1kg.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3684 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 16:37:31 +00:00
ebanks
47c4a70ac1
It turns out that it is legitimately possible for there to be reads that won't overlap within a target interval for cleaning. While we don't want to attempt cleaning, we also don't want to fail.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3682 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 15:50:44 +00:00
ebanks
ae33d8a2f2
I just wanted one more vote. It's settled: we die.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3681 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 14:00:56 +00:00
ebanks
8fb37f5f7a
For Kiran: warn the user when the actual and vcf ref bases differ so that if an exception is generated later, he knows why. All: should we generate the actual exception here? Is there any reason to allow cases where the vcf record has a different ref base than the actual reference? I'd vote that we die here. Thoughts?
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3680 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 13:56:16 +00:00
delangel
d932322190
More necessary fixes for VCF4.0 - now results look more sensible in realistic, bigger VCF files produced by say Dindel and not just the small test VCF:
...
- Fixed and cleaned code to produce trailing and padding bases in alleles around indels.
- Deal better with missing fields.
Pending:
- Chopping missing fields at end of genotypes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3679 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 02:59:30 +00:00
ebanks
12c0de6170
Added ability to clean using only known indels. Added integration test for it. Fixed vcf->vc conversion for indels which was busted.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3678 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 01:20:56 +00:00
chartl
610cc7ae2b
Cool package trick Kiran showed me. VariantEvaluator no longer public, AAT specifies the core package even though it lives in oneoffs. Disabled so integration tests pass.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3677 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 22:42:04 +00:00
chartl
4c6f4e41c6
Include making VariantEvaluator public within the package so my oneoffs can be seen (not included in previous submit specifically because I didn't want to break the build by changing anything in core...the road to hell is paved with good intentions)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3676 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 22:26:52 +00:00
chartl
9ac13b8f5d
Name and body change for this module to reflect local code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3675 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 21:45:26 +00:00
aaron
844cb2ed33
fixing a bug that Eric found with RODs for reads, where some records could be omitted. Sorry Eric!
...
Also putting more tolerance into the timing on the tibble index tests (that check to make sure we're deleting out of date indexes, and not deleting perfectly good indexes). It seems that some of the farm nodes aren't great with a stopwatch.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3674 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 21:38:55 +00:00
chartl
101c27294d
Comment this guy out so we build again. (Hate it when my repository goes all funky.)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3673 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 21:16:33 +00:00
chartl
3017f82550
Initial commit of items for analyzing amino acid transitions in variant eval. Blew up my subversion by coding locally while i did not have internet. I hope this doesn't bust any integrationtests since I changed no existing code but...who knows. Crossing my fingers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3672 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 20:57:18 +00:00
delangel
e3fb4d5c70
Intermediate checkin, just to fix null pointer exception that happened when merging implementation with latest VCF4 decoder - field ORIGINAL_ALLELE_LIST in vc shouldn't be written in infoFields structure since this won't be output to file and there is no legal structure under this key.
...
Base encoding for complex events is still brittle and most probably still has issues, fixes upcoming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3671 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 20:57:09 +00:00
ebanks
baf9479c35
An addition for Sendu since he can't seem to tell when his CountCovariate jobs die in the middle of writing the CSVs. We now write an EOF marker at the end of the covariates table and look for it when reading in the file in TableRecalibrationWalker. By default, we warn the user if the EOF marker isn't present, but we exception out if the user provides the --fail_with_no_eof_marker option.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3670 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 18:50:07 +00:00
delangel
3ca2b7374b
Fixes to better deal with the "Type" and "Number" field in the INFO and FORMAT header lines in VCF4.0. We now record these fields and provide appropriate conversions. This is the first version that passes fully the VCF validator.
...
Also, moved the flag indicating VCF4.0 to the VCFWriter constructor.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3669 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 16:43:00 +00:00
ebanks
801b47c6e9
For Sendu: a similar addition to the Indel Genotyper allowing it to emit a metrics file (which for now consists only of # of normal/tumor calls made)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3668 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 13:19:17 +00:00
ebanks
ddf87e61c2
For Sendu: optionally emit a metrics file with callability info (including number of actual calls made) from UG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3667 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 12:57:28 +00:00
ebanks
929e5b9276
Fix possible null pointer exception
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3666 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 09:01:18 +00:00
hanna
2953c9f069
Efficiency improvement requested by the Picard team in IndexedFastaSequenceFile: improve the memory efficiency
...
(and loading time) of long reference sequences by better controlling the input buffer size.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3665 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 07:22:07 +00:00
delangel
ed71e53dd4
1) Initial complete version of VCF4 writer. There are still issues (see below) but at least this version is fully functional. It incorporates getting rid of intermediate VCFRecord so we now operate from VariantContext objects directly to VCF 4.0 output.
...
See VCF4WriterTestWalker for usage example: it just amounts to adding
vcfWriter.add(vc,ref.getBases()) in walker.
add() method in VCFWriter is polymorphic and can also take a VCFRecord, lthough eventually this should be obsolete.
addRecord is still supported so all backward compatibility is maintained.
Resulting VCF4.0 are still not perfect, so additional changes are in progress. Specifically:
a) INFO codes of length 0 (e.g. HM, DB) are not emitted correctly (they should emit just "HM" but now they emit "HM=1").
b) Genotype values that are specified as Integer in header are ignored in type and are printed out as Doubles.
Both issues should be corrected with better header parsing.
2) Check in ability of Beagle to mask an additional percentage of genotype likelihoods (0 by default), for testing purposes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3664 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 23:54:38 +00:00
ebanks
4a451949ba
add parallel option to target creator for masking out reads with bad mates
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3663 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 22:13:25 +00:00
ebanks
6a23edd911
Fix performance tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3662 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 21:51:48 +00:00
chartl
20f5fdbcf7
Changes to MVC to make the the header of its output VCF compliant with spec (give expected # of values for info field annotations)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3660 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 18:33:23 +00:00
aaron
62d22ff1aa
adding the original allele list to a variant context (as the annotation ORIGINAL_ALLELE_LIST), in the case where the set alleles are the result of clipping. Added tests for both cases.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3658 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 17:23:46 +00:00
ebanks
1292c96e29
The cleaner now adds the OC (original cigar) and OS (original alignment start) tags as appropriate to reads that get realigned; this feature can be turned off. Also, improved integration tests (sorry, Kiran!).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3657 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 16:46:47 +00:00
asivache
cc8d8eaedb
Now that we always reserve space for two read ends when collecting stats stratified by libraries, we need to check that the second end was indeed present; otherwise the pointer is null and this was causing an exception
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3656 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 16:40:16 +00:00
ebanks
9a24598a98
By default, don't clean reads with mates mapped to other chromosomes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3654 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 15:14:20 +00:00
ebanks
bf5cbad04c
Make the target creator a rod walker (that allows reads) so that we can easily trigger the cleaner on only known indel sites. Adding an integration test to cover this case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3651 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 13:28:37 +00:00
ebanks
464ac63a22
Allowing N's in ALT field
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3650 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 11:41:32 +00:00
hanna
3a9d426ca8
Added hasPileupBeenDownsampled() boolean to ReadBackedPileup, so that a pileup can report whether or not (but not how much) it's been downsampled.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3649 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 04:56:33 +00:00
ebanks
8e848ccd84
SAMFileWriters can now write to /dev/null without throwing exceptions, so we can remove the try/catch blocks.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3648 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-27 03:59:10 +00:00
aaron
09ccdf83b2
fixing a broken test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3647 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 21:59:00 +00:00
depristo
d6cbe4d0ad
Bug fixes to support haploid genotypes, optimization for indexing, now tracks the line of the VCF and catches errors to tell you the line no and line when a parsing error occurred.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3646 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 21:08:41 +00:00
aaron
5f8a3f95ef
The GT field once again reigns supreme (it must be the first genotype field). Thanks for the catch Eric.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3645 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 21:03:05 +00:00
kshakir
75c98c42b8
Started path of deprecation of Sting's @Argument by splitting the annotation into @Output and @Input. Anything that's not an @Output should be an @Input.
...
Checked in example qscripts that are basically todo integration tests.
Replaced use of queue @Input/@Output with Sting's new @Input/@Output. This means you'll now have to doc-ument the annotations.
More work on dependency resolution cycles being created in the graph during scatter/gather.
Filtering nulls to avoid NPE exceptions in scala's 'Collection'.hashCode.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3643 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 20:51:13 +00:00
weisburd
147ba68441
Fixed bug with mrnaCoord field - made it count exon positions only, rather than introns & exons
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3642 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 19:53:32 +00:00
aaron
d3848745ab
moving VCF 3.3 back into the GATK so Guillermo can make changes for VCF 4 output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3639 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 18:20:06 +00:00
aaron
b3edb7dc08
two fixes for the VCF 4 parser:
...
- Allow the "GT" field in genotypes at any point in the genotype string (before we required they be the first key-value pair).
- Fix a bug with the phasing value put into the VariantContext, thanks for the catch Guillermo!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3638 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 18:01:23 +00:00