ebanks
5a1a3fc79a
Fix bad VariantContext creation in unit test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3824 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 20:21:01 +00:00
ebanks
693672a461
Refactoring the VCF writer code; now no longer uses VCFRecord or any of its related classes, instead writing directly to the writer. Integration tests pass, but some are actually broken and will be fixed this week.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3822 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 13:19:56 +00:00
ebanks
379584f1bf
Re-enable (most of) these tests. Guillermo will re-enable the other one when the VCF->VC conversion is done for indels
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3821 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-18 03:24:28 +00:00
delangel
55b756f1cc
First step in major cleanup/redo of VCF functionality. Specifically, now:
...
a) VCF track name can work again with 3.3 or 4.0 VCF's when specifying -B name,VCF,file. Code will read header and parse automatically the version.
b) Old VCF codec is deprecated. Reader goes now direct from parsing VCF lines into producing VariantContext objects, with no intermediate VCF records. If anyone can't resist the urge to still input files using the old method, a new VCF3Codec is in place with the old code, but it will be eventually deleted.
c) VCF headers and VCF info fields no longer keep track of the version. They are parsed into an internal representation and will be output only in VCF4.0 format.
d) As a consequence, the existing GATK bug where files are produced with VCF4 body but VCF3.3 headers is solved.
e) Several VCF 4.0 writer bugs are now solved.
f) Integration test MD5's are changed, mostly because of corrected VCF4.0 headers and because validation data mostly uses now VCF4.0.
g) Several VCF files in the ValidationData/ directory have been converted to VCF 4.0 format. I kept the old versions, and the new versions have a .vcf4 extension.
Pending issues:
a) We are still not dealing with indels consistently or correctly when representing them. This will be a second part of the changes.
b) The VCF writer doesn't use VCFRecord but it does still use a lot of leftovers like VCFGenotypeEncoding, VCFGenotypeRecord, etc. This needs to be simplified and cleaned.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3813 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 22:49:16 +00:00
aaron
36ac73cf9a
comment out broken test until it can be fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3810 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 20:04:40 +00:00
hanna
96034aee0e
Cleanup for Steve Hershman's issue. In the midst of doing this, I discovered
...
that the semantics for which reads are in an extended event pileup are not
clear at this point. Eric and I have planned a future clarification for this
and the two of us will discuss who will implement this clarification and when
it'll happen.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3809 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 18:57:58 +00:00
aaron
ec94cfdf05
remove unit test for VCF writer, it's not applicable now that we produce only VCF4. Guillermo, it's up to you if you want to adapt this or remove it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3803 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 14:33:25 +00:00
depristo
b29eda83bb
Parallelized CountCovarites! percent_ref_called_var now a standard genotype concordance module (for validation!). Really much smarter merging of headers for combineVariants. VCF codecs now actually look at the file version and blow up if they are the wrong versions. setHeaderVersion() in VCFHeaderLine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3802 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 14:10:18 +00:00
ebanks
e7e58d7129
The SAM spec has now officially reserved my new tags for original cigar and original alignment start... except that OS has been named OP ('original POS')
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3800 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 00:09:36 +00:00
ebanks
a4f8d70d8d
oops, forgot to update this integration test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3788 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 11:38:33 +00:00
ebanks
460283f6d2
No more manually converting VariantContexts to VCFRecords. You should be utilizing VCs and not VCFRecords.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3787 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 05:21:28 +00:00
ebanks
6b5c88d4d6
The GATK no longer writes vcf3.3; welcome to the world of vcf4.0. Needed to fix a few output bugs to get this to work, but it's looking great. Much more still to come. Guillermo: hopefully this doesn't break your local build too badly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3786 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 04:56:58 +00:00
ebanks
9a05e8143d
Move to 4.0 and away from VCFRecord.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3780 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 15:54:54 +00:00
ebanks
7e7da75d27
Moving over to 4.0 and away from VCFRecord
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3778 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 14:07:10 +00:00
ebanks
d896d03554
Moving VF to vcf 4.0. Still need to fix genotype filters.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3777 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 11:39:51 +00:00
ebanks
76b3b39720
Technically, Mark broke this with his commit earlier. But since I had an outstanding broken test, I lose and have to fix this one too...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3776 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 03:58:38 +00:00
ebanks
1bef7dd170
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3775 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 00:56:12 +00:00
ebanks
52c534a8f2
Updating to VCF 4.0
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3770 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 20:18:30 +00:00
ebanks
e50627a49e
1. Updated tests and added integration test for liftover code.
...
2. Updated liftover code (and scripts) to emit vcf 4.0 and no longer depend on VCFRecord.
3. Beagle walker now also emits vcf 4.0.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3767 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 17:58:18 +00:00
ebanks
221e01fb27
deleting/archiving as instructed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3765 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 16:59:45 +00:00
ebanks
e75b3e13bd
updating unit test for previous fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3761 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 03:23:53 +00:00
ebanks
fb717fe128
First pass needed to remove old VCF code: moving all VCF-related constants into a single unified class
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3759 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-11 07:19:16 +00:00
chartl
ea8fd506bf
Update to PickSequenomProbes: Option to ignore mask sites within X bp of a variant (very useful for indels where dbSNP entries near the indel are almost always false SNP calls). Also fixed an integration test where the variant site itself, being in dbSNP, was represented as [N/C] rather than [A/C]. Added integration test for 1bp no-mask window.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3753 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 04:03:19 +00:00
depristo
45fb614296
Fixes to VE for obscure bug, as well as disabled integration test for CombineVariants
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3749 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-09 00:13:07 +00:00
ebanks
6e6ad36523
reallow MNP events through
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3740 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 06:26:52 +00:00
ebanks
9a81f1d7ef
Fixed this tool for chartl so that it now properly handles deletions. Added deletion case to integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3737 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-08 04:45:59 +00:00
hanna
9fc05ac2ae
eagerDecode is now false.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3733 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 22:51:48 +00:00
ebanks
4bc3ad2194
Shame on me: UG was emitting negative QUALs (-0) in all_bases mode. Thanks, Matt.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3732 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 20:30:22 +00:00
ebanks
30714ec8d9
As per quick chat with Richard Durban, don't increase the mapping quality of realigned reads too much; for now, arbitrarily increase the MQ by 10. We need to figure out a better solution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3731 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-07 20:12:59 +00:00
aaron
86031f4034
part two: todo's in combine variants, fixes for InferredGeneticContext, and some other tests and clean-up.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3721 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 21:07:53 +00:00
ebanks
36edc60ccc
Connected UG to the new comp track annotation system in VA. Also, when emit confidence is lower than call confidence (so that we emit records filtered with LowQual), add a corresponding FILTER header field to the VCF so that the validator doesn't complain.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3720 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 13:04:24 +00:00
aaron
3347d1ca7c
part one of combining format and info header lines code into a single abstract class for Mark; plus some 'm' removals from access methods for Eric. Adding fixes for CombineVariants next.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3719 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 05:57:58 +00:00
weisburd
9ec393bfce
Updated md5 - vcf header line change
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3714 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-02 21:02:09 +00:00
depristo
61e2b2e39b
Nearly finalize merging capabilities for CombineVariants. Support for dealing with inconsistent indel alleles at loci. Improvements to Allele and removal of addAllele to MutableGenotype. We are close to being able to merge all of 1000 genomes -- snps and indels -- into a single combined vcf
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3710 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-02 13:32:33 +00:00
aaron
3093a20a55
fixing VCF header format and info fields so that they propery emit the unbounded count value correctly for vcf4 or vcf3. Eric we should update the vcf4 spec page to indicate format fields are allowed to use the unbounded count as well (if this is true).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3707 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 22:02:16 +00:00
rpoplin
255b036fb5
Variant Recalibrator MLE EM algorithm is moved over to variational Bayes EM in order to eliminate problems with singularities when clustering in higher than two dimensions. Because of this there is no longer a number of Gaussians parameter. Wiki will be updated shortly with new recommended command.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3704 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 18:51:07 +00:00
aaron
43ca595d15
VCF headers now can be set to a particular VCF version after creation, which converts the header lines to the appropriate encoding on output. Plus some clean-up of the code.
...
Also commented out the Tribble index out-of-date tests, the timing seems to be troublesome from the farm.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3702 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 05:32:14 +00:00
hanna
4995950d04
IndexedFastaSequenceFile is now in Picard; transitioning to that implementation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3701 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 04:40:31 +00:00
ebanks
944dbb94ce
Refactored and generalized the database/comp annotations in VariantAnnotator. Now one can provide comp tracks as with VariantEval (e.g. compHapMap, comp1KG_CEU) and the INFO field will be annotated with the track name (without the 'comp') if the variant record overlaps a comp site (e.g. ...;1KG_CEU;...). This means that you can now pass 1kg calls to the Unified Genotyper and automatically have records annotated with their presence in 1kg.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3684 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 16:37:31 +00:00
ebanks
12c0de6170
Added ability to clean using only known indels. Added integration test for it. Fixed vcf->vc conversion for indels which was busted.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3678 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-30 01:20:56 +00:00
aaron
844cb2ed33
fixing a bug that Eric found with RODs for reads, where some records could be omitted. Sorry Eric!
...
Also putting more tolerance into the timing on the tibble index tests (that check to make sure we're deleting out of date indexes, and not deleting perfectly good indexes). It seems that some of the farm nodes aren't great with a stopwatch.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3674 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 21:38:55 +00:00
ebanks
baf9479c35
An addition for Sendu since he can't seem to tell when his CountCovariate jobs die in the middle of writing the CSVs. We now write an EOF marker at the end of the covariates table and look for it when reading in the file in TableRecalibrationWalker. By default, we warn the user if the EOF marker isn't present, but we exception out if the user provides the --fail_with_no_eof_marker option.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3670 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 18:50:07 +00:00
ebanks
4a451949ba
add parallel option to target creator for masking out reads with bad mates
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3663 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 22:13:25 +00:00
ebanks
6a23edd911
Fix performance tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3662 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 21:51:48 +00:00
aaron
62d22ff1aa
adding the original allele list to a variant context (as the annotation ORIGINAL_ALLELE_LIST), in the case where the set alleles are the result of clipping. Added tests for both cases.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3658 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 17:23:46 +00:00
ebanks
1292c96e29
The cleaner now adds the OC (original cigar) and OS (original alignment start) tags as appropriate to reads that get realigned; this feature can be turned off. Also, improved integration tests (sorry, Kiran!).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3657 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 16:46:47 +00:00
ebanks
bf5cbad04c
Make the target creator a rod walker (that allows reads) so that we can easily trigger the cleaner on only known indel sites. Adding an integration test to cover this case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3651 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 13:28:37 +00:00
ebanks
8e848ccd84
SAMFileWriters can now write to /dev/null without throwing exceptions, so we can remove the try/catch blocks.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3648 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-27 03:59:10 +00:00
aaron
09ccdf83b2
fixing a broken test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3647 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 21:59:00 +00:00
aaron
5f8a3f95ef
The GT field once again reigns supreme (it must be the first genotype field). Thanks for the catch Eric.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3645 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 21:03:05 +00:00
aaron
b3edb7dc08
two fixes for the VCF 4 parser:
...
- Allow the "GT" field in genotypes at any point in the genotype string (before we required they be the first key-value pair).
- Fix a bug with the phasing value put into the VariantContext, thanks for the catch Guillermo!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3638 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 18:01:23 +00:00
weisburd
e15fe6858e
Disabling test - Will need to update big-tables soon.. will re-enable after updating md5
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3637 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-25 15:43:41 +00:00
aaron
682f9b46c6
Two fixes together:
...
1) Some improvements to the VCF4 parsing, including disabling validation.
2) Reimplemented RefSeq in the new Tribble-style rod system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3630 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 22:17:03 +00:00
aaron
62bc7651a8
fix for PSPW with DbSNP mask. Added an integration test for this case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3628 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 19:31:32 +00:00
aaron
8a9b2f4256
removing the GLF ROD.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3624 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 22:51:45 +00:00
aaron
611d834092
a couple of VCF 4 improvements:
...
-Validation of INFO and FORMAT fields.
-Conversion to the the correct type for info fields (i.e. allele frequency is now stored as a float instead of a string).
-Checks for CNV style alternate allele encodings( i.e. <INS:ME:L1>), right now we exception out. Maybe we should just warn the user?
-Tests for the multiple-base polymorphism allele case.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3622 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 20:21:43 +00:00
ebanks
b6bceb39b0
Fixing up output for performance tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3619 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 17:00:17 +00:00
hanna
003dd4de3e
Rev Picard with performance enhancements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3615 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 22:54:23 +00:00
aaron
0cafd3d642
clip VCF alleles for indels: only a single left base, and as many right bases as align before converting to variant context.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3614 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 22:42:38 +00:00
aaron
9872b65803
clip to the null allele on the reference string in VCF 4, instead of stopping to perserve one reference base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3613 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 20:52:19 +00:00
ebanks
b5df2705c9
-Remove Nway output option
...
-Remove in-memory sorting
-Default to name-sorting (although we allow coordinate sorting with the --sortInCoordinateOrderEvenThoughItIsHighlyUnsafe flag).
Cleaner, faster code. Wiki has been updated (including how to use FixMateInformation.jar from Picard). More changes coming soon.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3612 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 20:31:55 +00:00
aaron
a6d3e4bd47
Add code to allow reference alleles with 'N' in VariantContext, but not in the alternate allele(s). Also more updates to the VCF 4 code (fixed parsing for files without genotypes).
...
This check-in will temperarly break the build (I need to see if Bamboo is correctly returning the log file for the failed builds).
Will be fixed once Bamboo starts building.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3609 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 18:26:37 +00:00
ebanks
824c2bbac0
Finishing previous checkin
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3608 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 17:21:38 +00:00
aaron
32f324a009
incremental changes to the VCF4 codec, including allele clipping down to the minimum reference allele; adding unit testing for certain aspects of the parsing. Not ready for prime-time yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3604 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 06:31:05 +00:00
bthomas
300a18b85f
Updating the way reference data is processed, so GATK creates the .fasta.fai and .dict files automatically. If either (or both) don't exist, GATK will create them in the same folder as the fasta file. If it can't write the file, GATK will fail with a message to create them manually.
...
Note that this functionality will only work if the directory with the fasta is writeable. GATK will fail if directory is read only and and either the .fasta.fai or .dict files don't exist. In the future, we could have these references be created in memory, but we decided against it this time.
Locking was also added to ReferenceDataSource so no issues come up while running multiple GATKs on the same reference: we don't want one process to be half-finished and another try to read it. So, you could see error messages related to locking. See ReferenceDataSource.java for explanation of the locking strategy.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3601 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-21 21:42:42 +00:00
hanna
c806ffba5f
Switching over DownsamplingLocusIteratorByState -> LocusIteratorByState. Some operations
...
will not be as fast as they could be because the workflow is currently merge sam records (sharding)
-> split sam records (LocusIteratorByState) -> merge records (LocusIteraotorByState) -> split
records (StratifiedAlignmentContext), but this will be fixed when StratifiedAlignmentContext
is updated to take advantage of the new functionality in ReadBackedPileup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3599 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-21 02:11:42 +00:00
depristo
57a13805da
GATK now uses a optimized indexing scheme in Tribble. 5x or more performance gain on files with many genotypes. Updated integrationtest that was failing and was clearly wrong. DB=; isn't a valid annotation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3596 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-19 21:36:41 +00:00
kiran
8ff93f77e6
Added evaluation module to count functional classes (missense, nonsense, etc.). At the moment, it only understands Cancer's MAF annotations. Added integration test for the functional class counting. Added better description for VariantEval.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3595 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 21:51:40 +00:00
ebanks
1e06d2bf68
Initial HLA Caller integration tests. Kind of painful, but will improve with code refactoring.
...
This baby is now officially ours.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3593 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 20:35:27 +00:00
rpoplin
724affc3cc
Major bug fixes for the Variant Recalibrator. Covariance matrix values are now allowed to be negative. When probabilities are multiplied together the calculation is done in log space, normalized, then converted back to real valued probabilities. Clustering weights have been changed to only use HapMap and by-1000genomes sites. The -nI argument was removed and now clustering simply runs until convergence. Test cases seem to work best when using just two annotations (QD and SB). More changes are in the works and are being evaluated. Misc fixes to walkers that use RScript due to CentOS changes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3590 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 17:37:11 +00:00
aaron
c3434493b0
fixed integration test for VCF Header changes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3589 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 16:31:48 +00:00
aaron
42e7ff4f28
forgot to update a test, the md5sum of the underlying file changed (which is recorded in the ROD tests).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3586 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 13:27:56 +00:00
aaron
b978d5946b
adding changes for VCF 4, mostly in the way we handle VCF headers. The header fields are now aware of the differences between different VCF formats. There was also a bunch of clean-up of out-of-spec VCF used in the tests (mismatched VCF file format fields, etc), and updates to the associated integration tests. Also some logging statements for BTI.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3584 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 08:23:23 +00:00
weisburd
e26a273ef5
Turned the test back on
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3582 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 22:57:42 +00:00
hanna
48cbc5ce37
Merging the sharding-specific inherited classes down into the base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3581 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 22:36:13 +00:00
hanna
612c3fdd9d
First pass at eliminating the old sharding system. Classes required for the original sharding system
...
are gone where I could identify them, but hierarchies that split to support two sharding systems have
not yet been taken apart.
@Eric: ~4k lines.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3580 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 20:17:31 +00:00
aaron
3d049204ed
some refactoring for the variant eval output system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3576 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-17 05:34:31 +00:00
hanna
db1383d0b2
Rev the latest version of Picard.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3575 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 23:55:07 +00:00
weisburd
5b370ffc62
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3574 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 20:42:58 +00:00
ebanks
01ffa307c2
When going NWay out in the cleaner, use the new *merged* header (instead of the original one) for each bam file so that it matches the new uniquified read group ids in the reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3569 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 19:36:36 +00:00
ebanks
7a91dbd490
Renamed some of the column names in Ti/Tv and Concordance modules so that they are clearer. Removed ValidationRate module (it was busted).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3564 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 15:53:06 +00:00
asivache
671ac00748
A simple utility class that implements a merging Iterator<GenomeLoc> built over an interval or bed file (this is NOT a rod, but rather a direct line-by-line file reader that converts strings to genome locs on the fly and merges overlapping intervals)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3546 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 15:54:37 +00:00
ebanks
8c28be5933
Fixing a VCF bug for Sendu: we weren't emitting flags (booleans) correctly in VCF3.3 (rev'ed tribble for this).
...
Updated dbsnp/hapmap membership info fields to be flags now instead of ints.
While I was there, I added the change in the Annotator for Jan to force reads to be from a specific sample.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3536 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-11 16:42:06 +00:00
bthomas
99b684ea89
Adding new support for reference data. ReferenceDataSource is a new class that manages reference data, and allows IndexedFastaSequenceFile to be a simple reader. This checkin also includes FastaSequenceIndexBuilder, which reads a fasta file and creates an index, like samtools faidx. Right now this is not enabled, because we are still working out thread safety. So the only new UI change is that GATK can be run without a fai file. Soon, we will enable 1) GATK to be run without a dict file too, and 2) both dict and fai files will be saved on disk for future program executions. For more info, see ReferenceDataSource.java
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3527 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-10 20:10:23 +00:00
ebanks
ca4eab1d23
Now annotations that require reads return null if there's no alignment context, so that running without reads adds annotations only for the appropriate fields.
...
Added an integration test for the read-less case.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3525 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 20:36:46 +00:00
ebanks
9b2fcc4711
Refactoring of the annotation system:
...
1. VA is now a ROD walker so it no longer requires reads (needs a little more testing)
2. Annotations can now represent multiple INFO fields (i.e. sets of key/value pairs)
3. The chromosome count annotations have been pulled out of UG and the VCF writer code and into VA where they belong. Fixed the headers too.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3513 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:05:51 +00:00
aaron
6d5556939d
updating Tribble with a couple of important Tabix fixes, and updating the variant eval integration tests to run each test with both plain vcf and gzipped tabix (added the tabix version
...
to the vlidation directory), using the same md5sum.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3509 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 01:47:04 +00:00
depristo
6eeb1693ca
JEXL2 upgrade. Improvements to JEXL processing including dynamically resolving variable -> value bindings instead of up front adding them to a map. Performance improvements and code cleanup throughout.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3494 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-07 00:33:02 +00:00
depristo
3ea506fe52
No more new Allele() -- must use create. Allelel simple alleles are now cached for efficiency reasons. VCF4 codec optimizations -- 4x performance in general. Now working in general but hooked up to the ROD system now as VCF4. WARNING -- does not actually work with indels, genotype filters, etc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3489 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 23:03:55 +00:00
aaron
0b03e28b60
updating the tribble library to include the reference dictionary reading / writing. We now check the dictionaries of any tracks that have them against the reference (all new tribble tracks and out-of-date tracks will have this). Also renamed some classes to be more reflective of their function.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3485 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 06:34:26 +00:00
depristo
e2b41082af
GATK now does automatic adaptor filtering in locus iterators (but not expt. downsampling iterator). General support for LocusIteratorFilters just like read filters but only applying at particular bases. Updated tools with new MD5 sums due to adaptor bases in their integrationtest data. Not that as a side effect here reads close to each other with odd orientations are also filtered out. Updated minor argument to VariantRecalibrator to change the qStep value on the command line
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3481 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 22:26:32 +00:00
aaron
8ec091d6d2
re-enabling regeneration of the tribble index if it's out of date. Also moved the class that can detect text in the log4j stream (useful in testing to make sure appropriate messages are generated).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3480 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 17:45:51 +00:00
depristo
21427211c0
Personal MD5 database system now live. WalkerTest now maintains a database of result files associated with MD5 results in integrationtest/, and provides command lines for diff-ing expected to current md5 results when encountering failed intergration tests. The suite currently takes 200Mb to store. Update and run intergrationtest to build your very own expectation database for future development work.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3466 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-31 16:06:16 +00:00
depristo
2b02324587
Support for detecting and automatically excluding reads reading into the adaptor sequence and, if desired, also only showing the first pair when two reads overlap in the fragment. Not enabled, an intermediate check in before updating and verifying the impact on locus walkers everywhere.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3465 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-30 18:00:12 +00:00
ebanks
ffeb3fd80d
Thanks to Guillermo, I found a bug in the Unified Genotyper output: GL was posteriors instead of likelihoods. Not a huge deal because the
...
priors were flat, but fixed nonetheless.
Also, needed to update Tribble.
Minor updates to the Beagle input maker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3461 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 19:28:26 +00:00
rpoplin
4e268ef6ac
Removing the Variant Recalibration Performance test because it isn't ready yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3460 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 18:27:25 +00:00
rpoplin
522dd7a5b2
Adding the variantrecalibration classes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3459 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 18:21:27 +00:00
rpoplin
2014837f8a
VariantOptimizer package is moved to core, renamed as VariantRecalibration, and added to the binary release package. VariantOptimizer walker is renamed to GenerateVariantClustersWalker and ApplyVariantClustersWalker renamed to VariantRecalibrator. Integration tests added, performance tests still to be done.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3458 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 18:20:18 +00:00
aaron
871cf0f4f6
Call out ROD types by there record type, instead of the codec type (which was clumsy). So instead of:
...
@Requires(value={},referenceMetaData=@RMD(name="eval",type= VCFCodec.class))
you'd say:
@Requires(value={},referenceMetaData=@RMD(name="eval",type= VCFRecord.class))
Which is more in-line with what was done before. All instances in the existing codebase should be switched over.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3457 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 14:52:44 +00:00
depristo
cc2bf549c8
Removing my unnecessary optimization. 10 lines later in the code the same optimization was applied. A monumental waste of time.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3455 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 14:10:48 +00:00
aaron
a4d834cc01
fixing the test I broke
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3454 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 02:06:20 +00:00
depristo
f2e7582cfc
Reorganization of SW code for clarity. Totally failure at raw optimization. Discovered that ~50% of reads being cleaned were perfect reference matches. New code comes with flag to look at NM field and not clean perfect matches. Can we turned off with command line option (needed for 1KG bams with bad NM fields). Going to rerun cleaning jobs due to accidentally rebuilding of stable codebase and loss of 2 days of runtime.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3452 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-27 23:16:00 +00:00
ebanks
058441fa39
Trivial renaming of test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3441 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 16:56:42 +00:00
aaron
a2fab07258
fixed the build problem: there were two copies of the AnnotatorInputTable Codec and Feature in two different spots.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3439 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 14:47:15 +00:00
chartl
88a06ad81f
Changes to Depth of Coverage:
...
- For speedup in large number of samples, base counts are done on a per read group level, then
merged into counts on larger partitions (samples, libraries, etc)
+ passed all integration tests before next item
- Added additional summary item, a coverage threshold. Set by (possibly multiple) -ct flags,
the summary outputs will have columns for "%_bases_covered_to_X"; both per sample, and
per sample per interval summary files are effected (thus md5s changed for these)
NOTE:
This is the last revision that will include the per-gene summary files. Once DesignFileGenerator is sufficiently general, and has integration tests, it will be moved to core and the per-gene summary from Depth of Coverage will be retired.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3437 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 03:39:22 +00:00
ebanks
0607f76a15
commenting out this test until I can figure out what the hell is going on with the codecs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3436 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 01:12:10 +00:00
ebanks
ae6c014884
Fixed UG parallelization bug. Better integration test to catch this in the future.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3432 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-25 21:03:45 +00:00
ebanks
434e920da9
Oops, forgot to update integration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3431 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-25 20:37:45 +00:00
delangel
a280a0ff0d
a) Made HaplotypeScore default annotation. This changed several integration tests, whose MD5 is now updated.
...
b) Disabled BaseQualRankSumTest, the returned p-values differ wildly from Matlab/R-provided ones, cause TBD.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3419 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 22:25:17 +00:00
chartl
745d7c582f
added integration test for intervals with no coverage due to filtering
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3414 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 16:52:42 +00:00
chartl
88cb93cc3c
Changes to Depth of Coverage (added maximum base and mapping quality flags; with new integration tests -- because they use b36, and the other test uses hg18, it's in a different class (integration test system can't change refs on the fly). Initial change to VariantAnnotator to allow it to see extended event pilups; you currently have to throw the -dels flag; and it's specified as "very experimental". Yet,all the integration tests pass.
...
Homopolymer Run now does the "right" thing (e.g. single bases are represented as HRun = 0 rather than HRun = 1) for indels. AlleleBalance now does something close enough to correct.
Added a convenience method to VariantContext that will return the indel length (or lengths if a site is not biallelic).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3409 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 13:02:01 +00:00
depristo
6faf101c6c
Minor improvements to Callable Loci for public consumption
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3408 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 12:50:11 +00:00
depristo
a10fca0d5c
Genotyper now is using bytes not chars. Passes all tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3406 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 21:02:44 +00:00
depristo
6ce3835622
Removing unused methods in QualityUtils; ReferenceContext now converting all bases to upper case, but can be disabled with static boolean
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3399 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 12:38:06 +00:00
depristo
5abac5c057
A few more char -> byte cleanups
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3398 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 00:02:06 +00:00
depristo
8a725b6c93
Restructuring of ReferenceContext and ReadWalkers to accept a ReferenceContext. Now ReferenceContext is byte[] backed not char[]. Please no more chars for the reference. All of the tests pass now. Coming check-ins are going to clean up the char / byte problems in the GATK
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3397 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 23:27:55 +00:00
aaron
ca386439be
only emit a warning if the tribble index is out of date, don't remove and replace it for them. Added a test case where the log4j appender checks the logging messages for the appropriate output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3393 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 15:12:48 +00:00
hanna
017ab6b690
Experimental versions of downsampler and Ryan's deduper are now available either
...
as walker attributes or from the command-line. Not ready yet! Downsampling/deduping
works in a general sense, but this approach has not been completely optimized or validated.
Use with caution.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3392 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 05:40:05 +00:00
aaron
7cfb9ff3dc
updates for Tribble 82, fixes for Ryans case where multiple processes would attempt to read/write to the same index, and a couple other Tribble-centric bug fixes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3382 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 19:34:45 +00:00
chartl
e016491a3d
Major refactoring of Depth of Coverage to allow for more extensible partitions of data (now can do read group, sample, and library; in any combination; adding more is fairly easy). Changed the by-gene code to use clones of stats objects, rather than munging the interval DoCs. (Fix for Avinash. Who, hilariously, thinks my name is Carl.) Added sorting methods to ensure static ordering of header and body fields.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3377 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 16:58:13 +00:00
hanna
0791beab8f
Checking in downsampling iterator alongside LocusIteratorByState, and removing
...
the reference implementation. Also implemented a heap size monitor that can
be used to programmatically report the current heap size.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3367 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-17 21:00:44 +00:00
chartl
b7d21627ab
Changes to DepthOfCoverage (JIRA items) and added back an integration test to cover it. Alterations to the design file generator to output all transcripts (rather than choosing one at random).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3366 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-17 17:23:00 +00:00
ebanks
32389dc0a9
Fixed GQ estimate when chosen genotype isn't the most likely according to the GLs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3362 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-14 19:17:46 +00:00
hanna
88bd7a2045
Reenabling UG parallelization performance tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3360 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-13 16:28:08 +00:00
hanna
0490909285
Fixed epic generic paths fail.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3359 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-13 15:59:57 +00:00
hanna
7ef87e5126
An integration test based on validating pileup to test parallelism in reads, reference, and RODs. This test runs in less
...
than a minute and fell over instantly in the case of the Tribble parallelism issue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3358 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-13 15:40:43 +00:00
hanna
ceec525420
Got rid of stray unicode characters in copyright message.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3357 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-13 14:47:39 +00:00
ebanks
c81b910f73
Commenting out the parallelization test which is failing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3354 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-12 18:39:53 +00:00
aaron
cac98ba5ef
a couple of small documentation fixes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3353 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-12 17:40:27 +00:00
aaron
2c55ac1374
fixes for parallel processing problems with Tribble, a small bug in the resource pool, and some more documentation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3349 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-12 06:13:26 +00:00
ebanks
34969f304c
Adding dbsnp to all UG performance tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3347 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-11 15:48:05 +00:00
ebanks
140e43b93b
Checking in to see whether it fails. If I start getting bombarded with Bamboo error reports, I'm commenting it out...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3346 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-11 15:39:42 +00:00
ebanks
572b383fe2
Make VA annotate dbsnp again
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3345 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-11 14:06:53 +00:00
depristo
64ccaa4c6a
Walkers and integration tests that calculate and compare callable bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3328 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-07 21:33:47 +00:00
aaron
7d2df3f511
example windowed ROD walker for Kristian, and updates to Tribble
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3325 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-07 17:12:50 +00:00
rpoplin
57f254b13a
VE integration test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3324 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-07 13:58:25 +00:00
aaron
78409dca0d
turned off the progress output from tribble when making an index, and fixing a case where the index file isn't writable so we instead make the index in memory.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3312 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-06 16:36:58 +00:00
aaron
a0d71540df
speed-up for VCF, adding code to the VCF reader to automagically make an index if one doesn't already exist, and a change to the VCF writer unit test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3305 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 20:19:42 +00:00
aaron
a68f3b2e9c
VCF moved over to tribble.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3302 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 17:28:48 +00:00
aaron
ad11201235
adding more ROD pile-up tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3301 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 16:01:11 +00:00
aaron
f497213933
DbSNP moved over to tribble
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3288 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-03 06:02:35 +00:00
ebanks
9dff578706
Added PG tag to bam header to let people know it's been cleaned.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3284 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-30 17:30:30 +00:00
ebanks
850f36aa61
Changes to the Unified Genotyper's arguments:
...
1. User can specify 4 confidence thresholds: for calling vs. emitting and at standard vs. 'trigger' sites.
2. User can cap the base quality by the read's mapping quality (not done yet).
3. Default confidence threshold is now Q30.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3281 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-30 16:44:24 +00:00
aaron
cbed0b1ade
Adding GeliText tribble track as the first enabled Tribble track. This mean 'Variants' is no longer valid for a ROD type, use GeliText instead. I've updated all the references in the codebase.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3271 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-29 22:50:17 +00:00
aaron
7fbfd34315
adding the GELI ROD validation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3270 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-29 21:43:00 +00:00
depristo
5dce16a8f1
Better genotype concordance module. Code refactoring for clarity (please see below/after for educational purposes). Now reports variant sensitivity, concordance, and genotype error rate by default. Also aggregates this data across all samples, so you get a per sample and overall stats for each of these in the allSamples row.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3265 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-28 13:10:11 +00:00
ebanks
df31eeff9f
minor change
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3259 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-26 06:05:29 +00:00
depristo
7f4d5d9973
Ti/Tv by AC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3252 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-23 17:56:29 +00:00
rpoplin
e7c0ded40e
Fixed long-standing bug in GenotypeConcordance module of VariantEval which caused incorrect numbers to be displayed in the concordance table. The format of the concordance table has changed. Added a concordance summary table which gives overall genotype concordance summary stats by sample. None of the VE integration tests contained genotype information so I added a comp track with genotypes to one of the tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3247 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-23 15:48:41 +00:00
aaron
f050beada6
make sure we do delete the temp file we create
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3244 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-23 05:32:49 +00:00
aaron
536f22f3bd
adding VC adaptor for GELI, along with unit tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3243 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-23 05:28:39 +00:00
hanna
32d86cf457
Rev the reservoir downsampler to support partitioning through a functor.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3232 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 19:50:26 +00:00
ebanks
e9e844fbf5
1. Reverting: dbsnp automatically is a comp
...
2. Fixing logic for min Qscore calculation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3230 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 18:51:35 +00:00
asivache
532263ea25
Oooops, forgot to update the test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3229 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 18:38:24 +00:00
ebanks
4abd3b0b7b
Fixing known/novel calc now that dbsnp isn't a default comp track
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3223 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 05:43:59 +00:00
ebanks
3b5673d967
1. Removed -all; by default all modules are used; use -none for no modules.
...
2. Don't make dbsnp track be a comp by default (to cut back on output). Please let me know if someone wants this back for some reason.
3. Cleaned up dbsnp module output to print the right numbers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3220 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 02:46:42 +00:00
aaron
4e18c54bb8
fixing a couple of commented out portions of the VCFReader test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3219 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 22:20:35 +00:00
aaron
80c4f88a72
removing the Variation interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3216 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 18:56:45 +00:00
hanna
c1e53d407d
The copyright tag that I copied/pasted from a LaTeX document into IntelliJ had
...
unicode quote characters embedded in it. These characters were invisible inside
IntelliJ but cause compile warnings for Ryan and Aaron, who for whatever reason
have a different default charset. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3203 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 15:26:32 +00:00
aaron
b5f6f54968
Almost done removing any trace of the old Variation and Genotype interfaces.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3202 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 14:52:15 +00:00
hanna
1bc26f69e9
An attempt to cleanup the Utils directory. Email to follow.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3198 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-19 23:00:08 +00:00
hanna
c08936d6f4
Added a reservoir downsampler which can sample elements in an iterator uniformly
...
from a stream (see Vitter 1985). Thanks to Eric and Andrey for the pointer.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3197 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-19 20:48:14 +00:00
ebanks
c44f63c846
Fixing the performance tests: we need to catch the RuntimeException (not samtools' RuntimeIOExcpetion). Also, CountCovariates doesn't need the catch.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3196 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-19 14:28:12 +00:00
ebanks
abf48cee05
Moving over to VariantContext from Variation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3195 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-19 06:56:29 +00:00
ebanks
d73c63a99a
Redoing the conversion to VariantContext: instead of walkers passing in a ref allele, they pass in the ref context and the adaptors create the allele. This is the right way of doing it.
...
Also, adding some more useful integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3194 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-19 05:47:17 +00:00
aaron
be7cbf948b
adding a catch for the exception thrown by samtools when it attempts to close /dev/null in the performance tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3186 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-16 17:41:48 +00:00
ebanks
7adff5b81a
Renaming for consistency
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3180 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 20:36:19 +00:00
ebanks
e702bea99f
Moving VE2 to core; calling it "VariantEval" (one more checkin coming)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3179 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 20:25:47 +00:00
chartl
ac6f6363ce
Execs() temporarily disabled after removal of bam file. New tests forthcoming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3178 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 20:11:56 +00:00
ebanks
ac9dc0b4b4
Removing VariantEval (v1); everyone should be using VE2 now. Docs coming ASAP.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3177 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 19:53:02 +00:00
ebanks
5f7564bf0a
Better naming of output columns
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3175 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 18:08:07 +00:00
aaron
e682460c1f
add a fix so that XL arguments won't cancel out -BTI arguments, fixed a bug for Ben where the ROD -> interval list conversion was throwing an exception, and some old code removal.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3174 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 16:31:43 +00:00
ebanks
04909fa6ad
Removing arbitrary selects
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3169 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-14 17:46:39 +00:00
weisburd
b930dc52a5
Integration test for GenomicAnnotator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3167 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-14 14:43:25 +00:00
ebanks
dde092fb61
Added the ability in VE2 to select which eval modules to run, so that you aren't forced to use all of them. You can use --list to list all of the possible modules to run.
...
Heads up everyone: by default, *no* modules are run. Please add "-all" to your scripts to maintain the previous behavior.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3161 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-13 22:15:58 +00:00
hanna
8573b0bc6f
Refactoring intervals, separating the process of parsing interval lists,
...
sorting and merging interval lists, and creating RODs from intervals. This
gives Doug the ability to keep using our interval list parsing code when
sorting intervals on our behalf.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3159 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-13 15:50:38 +00:00
ebanks
e413882302
Generalizing the SequenomValidationConverter to be able to take in any arbitrary rod type (provided it can be converted to VariantContext).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3155 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-12 20:42:18 +00:00
ebanks
d06c7835d8
Adding performance tests for the indel realigner; should take ~3 hours.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3151 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-11 04:45:22 +00:00
ebanks
961ca05abc
Removed outdated Sequenom rod and renamed HapMapGenotypeROD to HapMapROD.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3149 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-11 01:43:07 +00:00
ebanks
fa01876255
UnifiedGenotyper performance tests (WG, WEx); currently takes just over an hour.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3148 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-09 19:42:29 +00:00
rpoplin
c2a37e4b5c
Variant Quality Score modules in VariantEval2 no longer create huge lists which hold all of the quality scores encountered and instead cast the quality score to an integer and use hash tables. Bug fix for files in which all the quality scores are set to -1.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3146 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-09 18:36:06 +00:00
ebanks
71f38a9199
Adding performance tests for the recalibrator (Whole Genome and Whole Exome tests).
...
Should take ~3 hours to run.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3145 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-09 18:30:59 +00:00
ebanks
fba48b515a
Heads up everyone:
...
For consistency, these tools should be writing to the walker's output stream and no longer use the -vcf argument.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3140 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-09 05:37:25 +00:00
chartl
7025f5b51d
Added an auxiliary table to DepthOfCoverage, which is the cumulative equivalent of the locus table (got tired of doing the calculation by hand). Also took care of a trailing tab in the per-locus output table.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3138 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-08 19:37:17 +00:00
aaron
9f6377f7fb
added a performance test build option (for the upcoming performance test suite), and added a sample performance test for VariantEval.
...
IMPORTANT: it was really redundant that we had -Dsingle and -Dsingleintegration to run single unit tests and integration tests, now you can just use -Dsingle to run a single test for performance, unit, and integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3136 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-08 15:37:15 +00:00
aaron
4014a8a674
A long overdue correction; all unit tests now end in 'UnitTest'. This was something we wanted to do for a while, and now with the performance tests coming, it was a good time to clean-up. Please label any new test appropriately: *UnitTest and *IntegrationTest are the two valid file name patterns for tests.
...
Thanks!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3135 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-08 06:14:15 +00:00
aaron
8fd59c8823
Modified the report system based on Ryan's feedback: tables are now created independently to avoid the permutation problem when they were all compressed in rows, and removed our dependency on FreeMarker. The Grep format stays the same.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3130 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 20:39:55 +00:00
depristo
918b746798
More detailed validation output. Fixes for genotyping overflow -- these are temporary and need to be properly resolved
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3129 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 16:38:28 +00:00
rpoplin
60c227d67f
Added new VE2 module to create a plot of titv ratio by variant quality score
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3125 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-06 15:19:27 +00:00
chartl
d7880ef7ad
Forgot to uncomment the AlignerIntegrationTest before committing. And yes, matt, commenting it out is, in fact, easier than just setting my classpath.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3110 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 17:17:16 +00:00
chartl
f7d1b8f5de
CoverageStatistics has now replaced DepthOfCoverage -- old DoC is in the archive.
...
Also, I can't be bothered to fix the spelling of "oldepthofcoverage" to contain the necessary number of D's. Be content that it does, however, contain the requisite number of O's.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3109 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 16:27:23 +00:00
aaron
585cc880a2
changed jexl expressions to jexl names in the VariantEval2 output, fixed integration test, and fixed a problem where a line was getting dropped in CSV output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3108 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 16:23:14 +00:00
bthomas
b4f6f54502
Reorganizing the way interval arguments are processed
...
Most of the changes occur in GenomeAnalysisEngine.java and GenomeLocParser.java:
-- parseIntervalRegion and parseGenomeLocs combined into parseIntervalArguments
-- initializeIntervals modified
-- some helper functions deprecated for cleanliness
Includes new set of unit tests, GenomeAnalysisEngineTest.java
New restrictions:
-- all interval arguments are now checked to be on the reference contig
-- all interval files must have one of the following extensions: .picard, .bed, .list, .intervals, .interval_list
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3106 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 12:47:48 +00:00
aaron
3d3d19a6a7
the last-mile commit for Tribble integration. The system is now ready for Tribble to be turned on, as soon as we've removed any dependencies in the ROD code on interfaces that aren't in the Tribble library (i.e. the Variation or Genotype interface on RODs). All of the walkers should be up to date.
...
a caveat: for anyone asking for all of the ROD's back from the RefMetaDataTracker (if your not using the facilities to get the track by name), you'll now be getting back a collection of GATKFeature objects. This object will contain the track name, and a method for getting the underlying object (getUnderlyingObject()), which will be the traditional RodVCF, rodDbSNP, etc. This layer is needed so we can integrate Tribble tracks (which don't natively have names). Calls that ask for RODs by name will still get back the traditional reference ordered data objects (RodVCF, rodDbSNP, etc).
Sorry for the inconvenience! More changes to come, but this is by far the largest (as has the greatest effect on end users).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3104 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-31 22:39:56 +00:00
chartl
dc802aa26f
Moved CoverageStatistics to core. This will be (soon) renamed DepthOfCoverage; so please use CoverageStatistics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3090 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 13:32:00 +00:00
depristo
8ea98faf47
Deleting the pooled calcluation model -- no longer supported.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3088 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 11:44:27 +00:00
aaron
074ec77dcc
First go of the new output system for VE2. There are three different report types supported right now (Table, Grep, CSV), which can be
...
specified with the reportType command line option in VE2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3083 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-27 03:59:32 +00:00
kshakir
20e3ba15ca
Added an optional argument -rgbl --read_group_black_list to filter read groups.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3079 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 19:38:57 +00:00
ebanks
73a14a985b
Moving VariantsToVCF to core.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3078 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 18:55:12 +00:00
ebanks
14bf6923a8
HapMap-to-VCF now works fine within Variants-to-VCF. Added integration test for it and removed old code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3077 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-26 18:34:59 +00:00
ebanks
4398a8b370
Updated. Now uses VariantContext and is truly "variants" to vcf (i.e. not just GELI to vcf).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3074 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 04:53:31 +00:00
aaron
5079f35e40
better method names for read based reference ordered data access.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3069 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 16:13:31 +00:00
aaron
7462a0b2d1
cleaned-up of VariantContextAdapter tests, fixed the double comparisons in equals() in RodGeliText (nice MathUtils.compareDoubles Kiran)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3064 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 15:18:30 +00:00
aaron
a69b8555dd
Geli to variant context.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3063 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 06:45:29 +00:00
aaron
eafdd047f7
GLF to variant context. Added some methods in GLF to aid testing; and added a test that reads GLF, converts to VC, writes GLF and reads back to compare.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3062 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-23 03:43:25 +00:00
asivache
ee1dc6092f
Test updated. Now we do not throw an exception when locus interval is out of bounds, we just return silently a reference context trimmed to the current shard boundaries. New test checks for trimming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3058 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 17:37:52 +00:00
aaron
439c34ed38
clean-up before annotating VariantEval2 for output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3055 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 07:39:20 +00:00
ebanks
4c4d048f14
Moving VariantFiltration over to use VariantContext.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3048 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 18:35:23 +00:00
ebanks
c88a2a3027
Fixing/cleaning up the vcf merge util
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3047 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 15:13:32 +00:00
ebanks
03480c955c
And now the UnifiedGenotyper can officially annotate genotype (FORMAT) fields too.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3039 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 04:58:37 +00:00
ebanks
0311980668
The VariantAnnotator can now officially annotate genotype (FORMAT) fields.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3037 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 03:30:14 +00:00
aaron
8a5f0b746e
some cleanup for the output system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3032 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 12:54:39 +00:00
ebanks
0247548400
Fixed one test and (temporarily) punted on another
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3030 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 06:22:48 +00:00
ebanks
ee0e833616
Some significant changes to the annotator:
...
1. Annotations can now be "decorated" with any arbitrary interface description - not just standard or experimental.
2. Users can now not only specify specific annotations to use, but also the interface names from #1 . Any number of them can be specified, e.g. -G Standard -G Experimental -A RankSumTest.
3. These same arguments can be used with the Unified Genotyper for when it calls into the Annotator.
4. There are now two types of annotations: those that are applied to the INFO field and those that are applied to specific genotypes (the FORMAT field) in the VCF (however, I haven't implemented any of these latter annotations just yet; coming soon).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3029 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 05:38:32 +00:00
ebanks
4340601c26
-Pushed base quals back down into SAMRecord; if -OQ is used, the SAMRecord quals get updated automatically
...
-Better integration test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3020 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-17 16:00:10 +00:00
hanna
2525ecaa43
Oops. Commented out some tests to improve performance and then checked in the commented out tests. Reverted.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3012 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 16:34:50 +00:00
hanna
6dd5f192e7
Performance improvements for RODs in conjunction with new sharding system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3010 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 14:54:12 +00:00
aaron
10e76abbbc
adding some VE2 report infrastructure; work-in-progress.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3008 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 03:57:42 +00:00
ebanks
202231141c
-Push the --use_original_qualities argument into the engine.
...
-Check that base and qual strings are the same lengths
-Fix one more bug in the clipper.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3006 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-16 02:06:11 +00:00
ebanks
411d25c8d1
-Integration tests for walkers that use original quals.
...
-framework for pushing -OQ into GATK (not done)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3004 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-15 18:46:31 +00:00
aaron
e365d308d4
add a new JEXLContext that lazy-evaluates JEXL expressions given the VariantContext.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3003 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-15 16:00:55 +00:00
ebanks
73d6167bd6
Fixing broken integration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2998 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-14 23:18:49 +00:00
depristo
4dd7c5972c
Unit tests for -XL arguments; expt. annotation calculating the GC content within 100 bp of the current SNP
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2997 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-14 21:08:14 +00:00
aaron
ecb59f5d0d
removed old tests and old code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2995 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 22:57:01 +00:00
aaron
88a48821ea
removed the dependence on removeRegion() in GenomeLocSortedSet
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2993 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 22:35:49 +00:00
depristo
b39b5edca8
Bug fix in variant eval 2. Preliminary (slow and buggy) support for -XL exclude lists.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2991 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 19:23:12 +00:00
aaron
1eb5f97255
fixed dropping single base intervals from deleteRegion, moving onto performance fixes.
...
(stop - start is length-1 on closed intervals, so we need to check greater than OR equals to zero)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2990 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 19:14:21 +00:00
aaron
661a043cef
adding methods to get RODs by name or type in read traversals, performance improvements to RODs for Reads in general, and some more Tribble infrastructure.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2984 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 21:13:39 +00:00
hanna
a7ba88e649
Rework the way the MicroScheduler handles locus shards to handle intervals that span shards
...
with less memory consumption.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2981 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 18:40:31 +00:00
aaron
dde9fd8a15
some rods-for-reads cleaning and performance improvements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2979 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 22:54:58 +00:00
depristo
4f4555c80f
PPV and Sensitivity added to validation tool output; support for arbitrary -sample arguments to subset variant contexts by sample
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2978 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 22:28:31 +00:00
ebanks
40d305bc7e
Added test of Nway cleaning for Matt; thanks to Aaron for the help.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2977 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 21:00:41 +00:00
depristo
486bef9318
Support for validationRate calculation in variant eval 2; better error messages for failed genome loc parsing; tolerance to odd whitespace in plinkrod, and fix for monomorphic sites in vcf2variantcontext.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2976 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 16:25:16 +00:00
ebanks
7ddd45d059
Hmm. I thought I removed this already.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2973 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 03:09:13 +00:00
ebanks
1a576525e9
misc improvements
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2972 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 03:00:28 +00:00
ebanks
6e855809e1
Renaming and moving relevant tools into a sequenom directory
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2971 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 02:31:10 +00:00
chartl
0a49dffa8f
Row/Column names are now R-friendly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2966 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 19:01:03 +00:00
ebanks
e5475a7ba9
re-enabling PlinkToVCF integration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2964 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 17:35:49 +00:00
ebanks
5a20bf0e64
3 changes to UG which break integration tests:
...
1. emit AA,AB,BB likelihoods in the FORMAT field for Mark
2. remove constraint that genotype alleles (in the GT field) need to be lexigraphically sorted.
3. Add bam file(s) used by genotyper to header for Kiran
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2963 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 17:16:47 +00:00
ebanks
9f3b99c11b
Moving UnifiedGenotyper and VariantAnnotator over to VariantContext system.
...
Removing obsolete genotyping classes.
First stage of removing dependence on old Genotype class.
More changes to come.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2960 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 03:41:07 +00:00
chartl
bca9bdcc68
Add integration test for quartiles overflowing on interval reduce
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2957 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 16:18:45 +00:00
hanna
a7fe07c404
A few stopgap fixes to get the GATK to the point where the old sharding
...
infrastructure can be torn down:
1) New sharding system emulates old MonolithicSharding mechanism.
2) Better awareness of differences between fasta and BAM files when creating
shards.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2948 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-07 21:01:25 +00:00
hanna
dd6122f682
Fixed another bug in the original sharding system. Updated integration tests
...
as appropriate.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2947 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-07 15:32:18 +00:00
hanna
ee2ec7ced9
Fix off-by-one error in original implementation of read sharding. Tested by
...
awking output of BamToFastq vs. samtools until the outputs matched exactly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2945 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-06 18:52:53 +00:00
depristo
ee913eca07
Forgot to check in fix this morning
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2943 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 21:07:19 +00:00
chartl
8738c544f1
Minor refactoring of CoverageStatistics to allow simultaneous output of per-sample and per-read group statistics.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2940 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 17:06:52 +00:00
hanna
7104a3a96c
Fix for accumulator exception when running reduce by interval walkers without
...
intervals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2935 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 01:04:08 +00:00
aaron
366771d5a6
another test-with-multiple outputs fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2934 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 22:46:15 +00:00
chartl
706d49d84c
Commit for Aaron
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2932 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 21:29:07 +00:00
aaron
54f04dc541
forgot to uncomment the auto-deletion of temp files...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2930 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 20:29:42 +00:00