delangel
3ca2b7374b
Fixes to better deal with the "Type" and "Number" field in the INFO and FORMAT header lines in VCF4.0. We now record these fields and provide appropriate conversions. This is the first version that passes fully the VCF validator.
...
Also, moved the flag indicating VCF4.0 to the VCFWriter constructor.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3669 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 16:43:00 +00:00
ebanks
801b47c6e9
For Sendu: a similar addition to the Indel Genotyper allowing it to emit a metrics file (which for now consists only of # of normal/tumor calls made)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3668 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 13:19:17 +00:00
ebanks
ddf87e61c2
For Sendu: optionally emit a metrics file with callability info (including number of actual calls made) from UG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3667 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 12:57:28 +00:00
ebanks
929e5b9276
Fix possible null pointer exception
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3666 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 09:01:18 +00:00
hanna
2953c9f069
Efficiency improvement requested by the Picard team in IndexedFastaSequenceFile: improve the memory efficiency
...
(and loading time) of long reference sequences by better controlling the input buffer size.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3665 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 07:22:07 +00:00
delangel
ed71e53dd4
1) Initial complete version of VCF4 writer. There are still issues (see below) but at least this version is fully functional. It incorporates getting rid of intermediate VCFRecord so we now operate from VariantContext objects directly to VCF 4.0 output.
...
See VCF4WriterTestWalker for usage example: it just amounts to adding
vcfWriter.add(vc,ref.getBases()) in walker.
add() method in VCFWriter is polymorphic and can also take a VCFRecord, lthough eventually this should be obsolete.
addRecord is still supported so all backward compatibility is maintained.
Resulting VCF4.0 are still not perfect, so additional changes are in progress. Specifically:
a) INFO codes of length 0 (e.g. HM, DB) are not emitted correctly (they should emit just "HM" but now they emit "HM=1").
b) Genotype values that are specified as Integer in header are ignored in type and are printed out as Doubles.
Both issues should be corrected with better header parsing.
2) Check in ability of Beagle to mask an additional percentage of genotype likelihoods (0 by default), for testing purposes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3664 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 23:54:38 +00:00
ebanks
4a451949ba
add parallel option to target creator for masking out reads with bad mates
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3663 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 22:13:25 +00:00
ebanks
6a23edd911
Fix performance tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3662 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 21:51:48 +00:00
kshakir
dce2c17404
Added "-bsubWait" where Queue waits for all the jobs to exit before exiting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3661 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 19:52:17 +00:00
chartl
20f5fdbcf7
Changes to MVC to make the the header of its output VCF compliant with spec (give expected # of values for info field annotations)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3660 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 18:33:23 +00:00
kshakir
c047232b18
Using picard for bam merging.
...
Properties now propagate to scatter/gather functions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3659 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 17:59:09 +00:00
aaron
62d22ff1aa
adding the original allele list to a variant context (as the annotation ORIGINAL_ALLELE_LIST), in the case where the set alleles are the result of clipping. Added tests for both cases.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3658 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 17:23:46 +00:00
ebanks
1292c96e29
The cleaner now adds the OC (original cigar) and OS (original alignment start) tags as appropriate to reads that get realigned; this feature can be turned off. Also, improved integration tests (sorry, Kiran!).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3657 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 16:46:47 +00:00
asivache
cc8d8eaedb
Now that we always reserve space for two read ends when collecting stats stratified by libraries, we need to check that the second end was indeed present; otherwise the pointer is null and this was causing an exception
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3656 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 16:40:16 +00:00
kiran
26ef1f84bf
Updates to not depend on an environment variable to figure out where libraries are (helpful for installation at the Sanger).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3655 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 15:46:41 +00:00
ebanks
9a24598a98
By default, don't clean reads with mates mapped to other chromosomes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3654 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 15:14:20 +00:00
weisburd
e7939f7036
Fixed error message
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3653 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 14:50:28 +00:00
kiran
b2127e59c4
A first draft of scripts and LaTeX templates required to automatically generate slides for the 1,000 Genomes Automated Data Processing Report.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3652 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 14:30:50 +00:00
ebanks
bf5cbad04c
Make the target creator a rod walker (that allows reads) so that we can easily trigger the cleaner on only known indel sites. Adding an integration test to cover this case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3651 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 13:28:37 +00:00
ebanks
464ac63a22
Allowing N's in ALT field
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3650 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 11:41:32 +00:00
hanna
3a9d426ca8
Added hasPileupBeenDownsampled() boolean to ReadBackedPileup, so that a pileup can report whether or not (but not how much) it's been downsampled.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3649 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 04:56:33 +00:00
ebanks
8e848ccd84
SAMFileWriters can now write to /dev/null without throwing exceptions, so we can remove the try/catch blocks.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3648 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-27 03:59:10 +00:00
aaron
09ccdf83b2
fixing a broken test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3647 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 21:59:00 +00:00
depristo
d6cbe4d0ad
Bug fixes to support haploid genotypes, optimization for indexing, now tracks the line of the VCF and catches errors to tell you the line no and line when a parsing error occurred.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3646 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 21:08:41 +00:00
aaron
5f8a3f95ef
The GT field once again reigns supreme (it must be the first genotype field). Thanks for the catch Eric.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3645 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 21:03:05 +00:00
kshakir
894ad354fa
Fixed typo in the name of the shell directory.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3644 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 20:59:40 +00:00
kshakir
75c98c42b8
Started path of deprecation of Sting's @Argument by splitting the annotation into @Output and @Input. Anything that's not an @Output should be an @Input.
...
Checked in example qscripts that are basically todo integration tests.
Replaced use of queue @Input/@Output with Sting's new @Input/@Output. This means you'll now have to doc-ument the annotations.
More work on dependency resolution cycles being created in the graph during scatter/gather.
Filtering nulls to avoid NPE exceptions in scala's 'Collection'.hashCode.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3643 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 20:51:13 +00:00
weisburd
147ba68441
Fixed bug with mrnaCoord field - made it count exon positions only, rather than introns & exons
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3642 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 19:53:32 +00:00
kshakir
ce27ed0d60
Added missing @ClassType to memory limits.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3641 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 18:56:35 +00:00
aaron
dff4c06763
Rev'ing Tribble with a special version that has excluded VCF 3.3
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3640 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 18:20:51 +00:00
aaron
d3848745ab
moving VCF 3.3 back into the GATK so Guillermo can make changes for VCF 4 output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3639 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 18:20:06 +00:00
aaron
b3edb7dc08
two fixes for the VCF 4 parser:
...
- Allow the "GT" field in genotypes at any point in the genotype string (before we required they be the first key-value pair).
- Fix a bug with the phasing value put into the VariantContext, thanks for the catch Guillermo!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3638 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 18:01:23 +00:00
weisburd
e15fe6858e
Disabling test - Will need to update big-tables soon.. will re-enable after updating md5
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3637 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 15:43:41 +00:00
aaron
efa60e5de5
and add changes to the vcf used in testing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3636 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 02:56:02 +00:00
aaron
f9c7803d4e
this got left off my last commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3635 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 02:42:44 +00:00
weisburd
1cb8f51f8c
Fixed -t arg
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3634 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 23:44:10 +00:00
weisburd
3cd0570c1e
Now can run with multiple processes, multiple threads, or both
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3633 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 23:25:01 +00:00
weisburd
dae3ce2c0f
changed log dir
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3632 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 23:08:13 +00:00
weisburd
fea8054e9e
Updated long name for -l to --run-locally
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3631 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 22:26:45 +00:00
aaron
682f9b46c6
Two fixes together:
...
1) Some improvements to the VCF4 parsing, including disabling validation.
2) Reimplemented RefSeq in the new Tribble-style rod system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3630 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 22:17:03 +00:00
weisburd
72e669538e
Updated arg description for -s
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3629 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 22:04:01 +00:00
aaron
62bc7651a8
fix for PSPW with DbSNP mask. Added an integration test for this case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3628 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 19:31:32 +00:00
hanna
4840ef6d3e
Another rev of picard for /dev/null writing.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3627 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 19:22:47 +00:00
hanna
c32f9d78ae
Rev picard again, this time for error writing to /dev/null.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3626 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 04:08:26 +00:00
corin
bcab0eba01
This replaces tearsheet.r, neatens up graphics, and allows the script to be used in R's interactive environment
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3625 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 01:02:58 +00:00
aaron
8a9b2f4256
removing the GLF ROD.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3624 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 22:51:45 +00:00
asivache
17d2043354
bug fix: now contigs not present in the sequence dictionary are registered properly and do not cause the script to break
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3623 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 20:59:38 +00:00
aaron
611d834092
a couple of VCF 4 improvements:
...
-Validation of INFO and FORMAT fields.
-Conversion to the the correct type for info fields (i.e. allele frequency is now stored as a float instead of a string).
-Checks for CNV style alternate allele encodings( i.e. <INS:ME:L1>), right now we exception out. Maybe we should just warn the user?
-Tests for the multiple-base polymorphism allele case.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3622 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 20:21:43 +00:00
aaron
54ae0b8e4e
some updates to tribble for the svn commit that will follow
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3621 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 20:20:07 +00:00
ebanks
f0fc34bb8e
Bug fix: N's are allowed in the ref so don't fail when e.g. dbsnp has an N!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3620 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 17:49:14 +00:00