asivache
c0891d512f
added: peekNextLocation(); it's quite hard (and probably unnecessary, ever) to make seekable iterator a peekable one, but it is quite easy and useful to be able to peek just the next location the iterator will jump to after next call to next()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2581 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:38:19 +00:00
rpoplin
9bf0d7250a
Fixing the testOtherOutput UG integration test so it will run.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2580 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 13:40:14 +00:00
ebanks
a082b948a3
Support throughout for S and N cigar elements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2579 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 03:45:42 +00:00
chartl
424d1b57f7
Sequenom to VCF now allows user to specify filters for QC, and they will appear in the filter field of the output VCF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2577 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 23:22:37 +00:00
rpoplin
f96b2b211e
My last checkin updating R code broke an unrelated UnifiedGenotyper integration test. Eric says that I should take out the verbose test.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2576 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 22:28:10 +00:00
rpoplin
49c44e7b36
PairedReadOrderCovariate is now a standard covariate and because of this CycleCovariate no longer multiplies by negative one for second of pair reads. Added PairedReadOrderCovariate to some of the integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2574 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 20:09:10 +00:00
hanna
05575e2e56
Better bounding for the locus window. Don't make the locus window calculation blow up if the GenomeLoc ends
...
up being outside the reference. Force the blowup elsewhere.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2573 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 17:03:54 +00:00
ebanks
8ca5bba738
We emit genotype data in the VCF record if the format string instructs us to (regardless of whether or not genotypes are provided - this was the wrong test).
...
SequenomToVCF now correctly has no-calls when probes fail.
Re-enabled SequenomToVCF integration test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2572 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 15:40:27 +00:00
chartl
6d1107a4ed
Update to SequenomToVCF
...
Output changing slightly so integration test disabled temporarily
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2571 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 15:32:05 +00:00
ebanks
f99586f91b
Added integration test for beagle and verbose output in UG.
...
Minor cleanup of VCFRecord code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2570 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 03:55:24 +00:00
hanna
02e23e2d9c
Threading support for beagle output files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2569 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 02:42:16 +00:00
aaron
0513690416
two fixes in the new cached DbSNP code:
...
-isBiallelic would incorrectly say triallelic sites are biallelic.
-getAlternateAlleleList was broken, since the new cached list is immutable, we couldn’t remove list items.
Also added a dbSNP validating walker to the one-offs, for testing the new b37 130 dbSNP rod.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2568 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 00:27:34 +00:00
asivache
a138bad95a
A rare but not-so-subtle bug fixed: a funky alignment (a kind that should not have been generated in the first place) could make the indel left-adjusting method to overshoot read start and build a cigar like -3M6I...
...
also, few minor fix-ups.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2567 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 21:29:50 +00:00
rpoplin
b51f4aae11
Updating the recalibrator to make use of StingSAMFileWriter.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2566 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 20:58:27 +00:00
rpoplin
c8ad025ad0
cleaning up unused import statements
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2565 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:52:37 +00:00
rpoplin
189829841b
The recalibrator now uses all input RODs when looking for known polymorphic sites not just the one named dbsnp. Added an integration test which uses both dbsnp and an input vcf file and skips over the union of the two.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2564 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:50:39 +00:00
aaron
16777e3875
more fixes for the empty interval list problem; you can now run LocusWindow traversals with an empty interval list, but the GATK will give you a warning (unless you're running in unsafe mode).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2563 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:47:43 +00:00
hanna
35a4fcc481
Additional sanity checking: make sure the user can't alter the header / compression level / presorted state of a file to which SAMRecords have already been written.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2562 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:39:41 +00:00
ebanks
03b7d5f5c7
1. Fixed small but embarrassing bug in weighted Allele Balance annotation calculation.
...
2. Made RankSumTest abstract; added 2 subclasses: BaseQualityRST and MappingQualityRST (the latter based on a suggestion from Mark Daly). Untested so they're still experimental.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2561 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:33:53 +00:00
hanna
58999a8e9d
Enhance the I/O management system to support custom headers and set the presorted flag
...
from the initialize() method (or at any time before the first SAM record is written).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2560 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:21:42 +00:00
aaron
3c5f5177b1
check to see if the parsed interval list is empty, since we now allow interval files that are empty. If so, make sure we default to a non-interval based traversal.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2559 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 17:52:27 +00:00
ebanks
040fdfee61
Cleaned up the interface to VCFRecord. It's now possible (and easy) to create records and then write them with a VCFWriter.
...
I've updated HapMap2VCF to use the new interface; Chris agreed to take care of Sequenom2VCF.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2558 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 21:42:12 +00:00
ebanks
42aff1d2c3
Annotator in general should be able to annotate monomorphic or tri-allelic sites.
...
It's up to the individual annotations to decide whether they want to annotate or not.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2556 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 19:52:18 +00:00
rpoplin
11f91b3c95
Reverting Eric's previous change because it killed the PG tag in the output bam file header. Added a new -compress command line argument to set the compression level of the output bam file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2555 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 19:02:56 +00:00
chartl
dfa3c3b875
Added:
...
SequenomToVCF - Takes a sequenom ped file and converts it to a VCF file with the proper metrics for QC. It's currently a rough draft,
but is working as expected on a test ped file, which is included as an integration test.
Modified:
VCFGenotypeCall -- added a cloneCall() method that returns a clone of the call
Hapmap2VCF -- removed a VCFGenotypeCall object that gets instantiated and modified but never used
(caused me all kinds of confusion when I was basing SequenomToVCF off of it)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2554 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 17:17:21 +00:00
rpoplin
62dd2fa5be
Fixing another bug in solid recal regarding negative strand reads. The isInconsistentColorSpace method incorrectly used the inconsistent tag added by parseColorSpace, the inconsistent tag is in the direction of the read like the color space tag, and not in the direction of the reference like everything else. This affects the recalibrated quality scores but the improvment in SNP calling performance is minor when using the default UG settings (min base quality 10).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2553 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 14:28:52 +00:00
ebanks
971834ca90
Added a walker to the vcf tools compilation: one that combines vcf records. Both merges and unions are supported (see documentation... when it gets written this week).
...
Also, moved some code that pulls samples out of rods from VCFUtils into SampleUtils.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2552 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-10 06:45:11 +00:00
ebanks
80af0f2f54
Changed the OUTPUT_BAM_FILE argument from String to SAMFileWriter and removed the call to close().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2551 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-10 03:45:54 +00:00
hanna
7893aaefe9
Updates to chunk iteration. Includes the return of the dreaded *2.java files;
...
hopefully I can find a way to kill these off before the Picard patch is ready.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2550 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 20:20:56 +00:00
ebanks
fcce77c245
Added -beagle option to emit likelihoods file for use with the BEAGLE imputation engine; still experimental.
...
(Also converted getPileup -> getBasePileup)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2549 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 18:41:04 +00:00
rpoplin
9cbae53ee1
Bug fixes for both SET_Q_ZERO and REMOVE_REF_BIAS solid recal modes regarding proper handling of negative strand reads. These changes yield a minor improvment in HapMap sensitivity.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2548 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 15:19:22 +00:00
ebanks
dfcd5ce25b
Fixed broken test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2547 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 06:13:01 +00:00
ebanks
d5ab002449
Curiously, it seems I never set the default base quality used by the Genotyper to 10. It's done now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2546 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 06:02:01 +00:00
ebanks
b468369dfa
-UG's call into VariantAnnotator now uses the full alignment context (as opposed to the filtered one)
...
-MQ0 annotation is now standard again
-Added AC and AN annotations to VCF output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2545 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 05:40:42 +00:00
rpoplin
f587ff46af
Tile is now a standard covariate. By default the TileCovariate returns -1 if tile can't be derived from the read's name. Added a new command line option -throwTileException which will force TileCovariate to throw an exception if tile can't be derived for a read. Singleton covariates, such as any read group without tile info, must be skipped over in TableRecalibration so that the sequential formulation doesn't apply the same correction more than once. TileCovariate class has been added to the Early Access package.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2544 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 22:51:41 +00:00
asivache
d01bde36a4
Make sure that reference view holds enough bases to pass full-length deleted sequence to the walker's map() function in extended event mode (this addresses the problem of a deletion crossing the shard's boundary, so that an attempt to extract deleted bases results in a crash)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2543 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 22:37:22 +00:00
asivache
e9bc85c188
Now has methods that allow to 1) check if a location is within the bounds of the reference view; 2) expand reference view (i.e. expand the bounds and reload the reference sequence) in order to accomodate specified location. The second method can be called directly since it performs a check and if the location is already within the bounds, then returns immediately. The costly ref sequence reloading occurs only when the location is not fully contained within the current bounds.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2542 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 22:35:17 +00:00
asivache
7f91b4d824
Bug fix. It would be nice if we could extract ROD annotations for the whole length of an extended event (indel), and we tried... But alas, it does not work with the current ROD system (after extracting length on ref > 1 ROD data for a deletion, rod iterator crashes on the attempt to re-load annotations for next reference base)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2541 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 21:30:55 +00:00
rpoplin
5f58492401
A rogue QualityUtils.MAX_REASONABLE_Q_SCORE managed to get through my previous bug fix. It should instead check the command line -maxQ argument.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2540 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 21:17:39 +00:00
ebanks
c7a8dffa89
Check for division by 0 in annotations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2539 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 19:27:15 +00:00
ebanks
9a658e6b18
-Fixed VCF header line bug
...
-Added useful trim() method for Strings for characters other than whitespace
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2538 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 17:51:41 +00:00
ebanks
b643a513bb
Minor interface change for VCFGenotypeRecord.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2537 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 16:48:09 +00:00
andrewk
431e9c2c8b
Add dbSNP ID to VCF output records
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2536 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 15:30:04 +00:00
depristo
076481f786
Fixes to mergeVCF -- now correctly supports merging of filter fields. Also removed incorrect hasFilteringCodes() function. Updated intergration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2535 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 14:50:13 +00:00
rpoplin
cea544871d
Fixed an issue with recalibrating original quality scores above Q40. There is a new option -maxQ which sets the maximum quality score possible for when a RecalDatum tries to compute its quality score from the mismatch rate. The same option was added to AnalyzeCovariates to help with plotting q scores above Q40. Added an integration test which makes use of this new -maxQ option.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2534 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 13:50:30 +00:00
ebanks
6c739e30e0
1. Removing an old version of the Genotype interface which is no longer being used. Needed to do this now so that the naming conflicts would cease.
...
2. Adding a preliminary version of the new Genotype/Allele interface (putting it into refdata/ as the VariantContext really only applies to rods) with updates to VariantContext. This is by no means complete - further updates coming tomorrow.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2533 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 05:51:10 +00:00
depristo
a9245a58e2
Fix for incorrect exception throwing in VCFRecord. It is reasonable to ask for the non-ref allele freq at all ref sites. Was only passing in tests because isReference was broken
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2532 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 01:18:30 +00:00
depristo
7215526810
Fix to isReference() in VCFRecord. Change to VariantCounter to correctly counter only non-genotype variants, as well as update to VariantEvalWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2531 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 00:03:29 +00:00
andrewk
6c4ac9e663
Updated HapMap2VCF to use the VCFGenotypeWriterAdapter interface; fixed bug in VCFParameters that affects VariantsToVCF and HapMap2VCF when reference is lower-cased; added integration test for HapMap2VCF that checks for the lower-case issue by testing against Hg18 region that has lower-cased bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2530 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 21:27:11 +00:00
aaron
576594eda2
clean-up of the GATK paper genotyper, and better output formatting for the simple call format we emit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2529 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 20:54:56 +00:00