rpoplin
70df30fc1b
Added method to AlignmentUtils which takes a read's cigar and the refBases char array given to a ReadWalker and returns the aligned reference char array. Bug fix in solid_recal_modes to use this aligned reference array. Recalibrator version number is no longer separate for each of the two walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2589 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 15:36:59 +00:00
ebanks
2a116bb5d6
Made the VCF validator a simple rod walker instead of having it be in a separate package.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2588 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 06:39:06 +00:00
hanna
b19bb19f3d
First successful test of new sharding system prototype. Can traverse over reads from a single
...
BAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2587 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 03:35:55 +00:00
aaron
db9570ae29
Looks bigger than it is:
...
* Moved GATKArgumentCollection into gatk.arguments folder to clean up the main folder, also added some associated argument classes (most of the changes).
* Added code the argument parsing system for default enums, we needed this so we could preserve the current unsafe flag, and at the same time allow finer grained control of unsafe operations. You can now specify:
"-U" (for all unsafe operations), "-U ALLOW_UNINDEXED_BAM" (only allow unindexed BAMs), "-U NO_READ_ORDER_VERIFICATION", etc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2586 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 00:14:35 +00:00
asivache
cff8b705c0
Oh, and the test would not work anymore...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2585 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:47:09 +00:00
kiran
04fdbbfa65
This is the beginning of a new version of VariantEval that can cut VCF files up in a variety of ways with JEXL expressions, select one sample out of a multi-sample VCF, and can load analysis modules dynamically.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2584 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:45:58 +00:00
asivache
df63f51253
No changes, just sync-ing; only some commented out debugging prints are added...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2583 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:45:15 +00:00
asivache
d85461c463
MergingIterator completely re-done. Now it is not a generic class (sorry guys), but rather it is tailored for merging ROD tracks. This implementation peeks the locations of next ROD annotations in each track, but does not actually read these RODs from underlying streams until the location is reached and it is time to actually return the object. Now underlying ROD track iterators (registered in the resource pool!) are not advanced prematurely past the current position and all the way to the next ROD record wherever it is, so that the sharding system can reuse them.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2582 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:43:36 +00:00
asivache
c0891d512f
added: peekNextLocation(); it's quite hard (and probably unnecessary, ever) to make seekable iterator a peekable one, but it is quite easy and useful to be able to peek just the next location the iterator will jump to after next call to next()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2581 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:38:19 +00:00
rpoplin
9bf0d7250a
Fixing the testOtherOutput UG integration test so it will run.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2580 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 13:40:14 +00:00
ebanks
a082b948a3
Support throughout for S and N cigar elements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2579 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 03:45:42 +00:00
chartl
424d1b57f7
Sequenom to VCF now allows user to specify filters for QC, and they will appear in the filter field of the output VCF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2577 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 23:22:37 +00:00
rpoplin
f96b2b211e
My last checkin updating R code broke an unrelated UnifiedGenotyper integration test. Eric says that I should take out the verbose test.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2576 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 22:28:10 +00:00
rpoplin
49c44e7b36
PairedReadOrderCovariate is now a standard covariate and because of this CycleCovariate no longer multiplies by negative one for second of pair reads. Added PairedReadOrderCovariate to some of the integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2574 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 20:09:10 +00:00
hanna
05575e2e56
Better bounding for the locus window. Don't make the locus window calculation blow up if the GenomeLoc ends
...
up being outside the reference. Force the blowup elsewhere.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2573 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 17:03:54 +00:00
ebanks
8ca5bba738
We emit genotype data in the VCF record if the format string instructs us to (regardless of whether or not genotypes are provided - this was the wrong test).
...
SequenomToVCF now correctly has no-calls when probes fail.
Re-enabled SequenomToVCF integration test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2572 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 15:40:27 +00:00
chartl
6d1107a4ed
Update to SequenomToVCF
...
Output changing slightly so integration test disabled temporarily
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2571 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 15:32:05 +00:00
ebanks
f99586f91b
Added integration test for beagle and verbose output in UG.
...
Minor cleanup of VCFRecord code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2570 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 03:55:24 +00:00
hanna
02e23e2d9c
Threading support for beagle output files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2569 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 02:42:16 +00:00
aaron
0513690416
two fixes in the new cached DbSNP code:
...
-isBiallelic would incorrectly say triallelic sites are biallelic.
-getAlternateAlleleList was broken, since the new cached list is immutable, we couldn’t remove list items.
Also added a dbSNP validating walker to the one-offs, for testing the new b37 130 dbSNP rod.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2568 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 00:27:34 +00:00
asivache
a138bad95a
A rare but not-so-subtle bug fixed: a funky alignment (a kind that should not have been generated in the first place) could make the indel left-adjusting method to overshoot read start and build a cigar like -3M6I...
...
also, few minor fix-ups.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2567 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 21:29:50 +00:00
rpoplin
b51f4aae11
Updating the recalibrator to make use of StingSAMFileWriter.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2566 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 20:58:27 +00:00
rpoplin
c8ad025ad0
cleaning up unused import statements
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2565 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:52:37 +00:00
rpoplin
189829841b
The recalibrator now uses all input RODs when looking for known polymorphic sites not just the one named dbsnp. Added an integration test which uses both dbsnp and an input vcf file and skips over the union of the two.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2564 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:50:39 +00:00
aaron
16777e3875
more fixes for the empty interval list problem; you can now run LocusWindow traversals with an empty interval list, but the GATK will give you a warning (unless you're running in unsafe mode).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2563 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:47:43 +00:00
hanna
35a4fcc481
Additional sanity checking: make sure the user can't alter the header / compression level / presorted state of a file to which SAMRecords have already been written.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2562 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:39:41 +00:00
ebanks
03b7d5f5c7
1. Fixed small but embarrassing bug in weighted Allele Balance annotation calculation.
...
2. Made RankSumTest abstract; added 2 subclasses: BaseQualityRST and MappingQualityRST (the latter based on a suggestion from Mark Daly). Untested so they're still experimental.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2561 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:33:53 +00:00
hanna
58999a8e9d
Enhance the I/O management system to support custom headers and set the presorted flag
...
from the initialize() method (or at any time before the first SAM record is written).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2560 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:21:42 +00:00
aaron
3c5f5177b1
check to see if the parsed interval list is empty, since we now allow interval files that are empty. If so, make sure we default to a non-interval based traversal.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2559 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 17:52:27 +00:00
ebanks
040fdfee61
Cleaned up the interface to VCFRecord. It's now possible (and easy) to create records and then write them with a VCFWriter.
...
I've updated HapMap2VCF to use the new interface; Chris agreed to take care of Sequenom2VCF.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2558 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 21:42:12 +00:00
ebanks
42aff1d2c3
Annotator in general should be able to annotate monomorphic or tri-allelic sites.
...
It's up to the individual annotations to decide whether they want to annotate or not.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2556 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 19:52:18 +00:00
rpoplin
11f91b3c95
Reverting Eric's previous change because it killed the PG tag in the output bam file header. Added a new -compress command line argument to set the compression level of the output bam file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2555 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 19:02:56 +00:00
chartl
dfa3c3b875
Added:
...
SequenomToVCF - Takes a sequenom ped file and converts it to a VCF file with the proper metrics for QC. It's currently a rough draft,
but is working as expected on a test ped file, which is included as an integration test.
Modified:
VCFGenotypeCall -- added a cloneCall() method that returns a clone of the call
Hapmap2VCF -- removed a VCFGenotypeCall object that gets instantiated and modified but never used
(caused me all kinds of confusion when I was basing SequenomToVCF off of it)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2554 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 17:17:21 +00:00
rpoplin
62dd2fa5be
Fixing another bug in solid recal regarding negative strand reads. The isInconsistentColorSpace method incorrectly used the inconsistent tag added by parseColorSpace, the inconsistent tag is in the direction of the read like the color space tag, and not in the direction of the reference like everything else. This affects the recalibrated quality scores but the improvment in SNP calling performance is minor when using the default UG settings (min base quality 10).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2553 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 14:28:52 +00:00
ebanks
971834ca90
Added a walker to the vcf tools compilation: one that combines vcf records. Both merges and unions are supported (see documentation... when it gets written this week).
...
Also, moved some code that pulls samples out of rods from VCFUtils into SampleUtils.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2552 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-10 06:45:11 +00:00
ebanks
80af0f2f54
Changed the OUTPUT_BAM_FILE argument from String to SAMFileWriter and removed the call to close().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2551 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-10 03:45:54 +00:00
hanna
7893aaefe9
Updates to chunk iteration. Includes the return of the dreaded *2.java files;
...
hopefully I can find a way to kill these off before the Picard patch is ready.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2550 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 20:20:56 +00:00
ebanks
fcce77c245
Added -beagle option to emit likelihoods file for use with the BEAGLE imputation engine; still experimental.
...
(Also converted getPileup -> getBasePileup)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2549 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 18:41:04 +00:00
rpoplin
9cbae53ee1
Bug fixes for both SET_Q_ZERO and REMOVE_REF_BIAS solid recal modes regarding proper handling of negative strand reads. These changes yield a minor improvment in HapMap sensitivity.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2548 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 15:19:22 +00:00
ebanks
dfcd5ce25b
Fixed broken test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2547 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 06:13:01 +00:00
ebanks
d5ab002449
Curiously, it seems I never set the default base quality used by the Genotyper to 10. It's done now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2546 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 06:02:01 +00:00
ebanks
b468369dfa
-UG's call into VariantAnnotator now uses the full alignment context (as opposed to the filtered one)
...
-MQ0 annotation is now standard again
-Added AC and AN annotations to VCF output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2545 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 05:40:42 +00:00
rpoplin
f587ff46af
Tile is now a standard covariate. By default the TileCovariate returns -1 if tile can't be derived from the read's name. Added a new command line option -throwTileException which will force TileCovariate to throw an exception if tile can't be derived for a read. Singleton covariates, such as any read group without tile info, must be skipped over in TableRecalibration so that the sequential formulation doesn't apply the same correction more than once. TileCovariate class has been added to the Early Access package.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2544 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 22:51:41 +00:00
asivache
d01bde36a4
Make sure that reference view holds enough bases to pass full-length deleted sequence to the walker's map() function in extended event mode (this addresses the problem of a deletion crossing the shard's boundary, so that an attempt to extract deleted bases results in a crash)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2543 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 22:37:22 +00:00
asivache
e9bc85c188
Now has methods that allow to 1) check if a location is within the bounds of the reference view; 2) expand reference view (i.e. expand the bounds and reload the reference sequence) in order to accomodate specified location. The second method can be called directly since it performs a check and if the location is already within the bounds, then returns immediately. The costly ref sequence reloading occurs only when the location is not fully contained within the current bounds.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2542 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 22:35:17 +00:00
asivache
7f91b4d824
Bug fix. It would be nice if we could extract ROD annotations for the whole length of an extended event (indel), and we tried... But alas, it does not work with the current ROD system (after extracting length on ref > 1 ROD data for a deletion, rod iterator crashes on the attempt to re-load annotations for next reference base)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2541 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 21:30:55 +00:00
rpoplin
5f58492401
A rogue QualityUtils.MAX_REASONABLE_Q_SCORE managed to get through my previous bug fix. It should instead check the command line -maxQ argument.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2540 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 21:17:39 +00:00
ebanks
c7a8dffa89
Check for division by 0 in annotations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2539 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 19:27:15 +00:00
ebanks
9a658e6b18
-Fixed VCF header line bug
...
-Added useful trim() method for Strings for characters other than whitespace
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2538 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 17:51:41 +00:00
ebanks
b643a513bb
Minor interface change for VCFGenotypeRecord.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2537 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 16:48:09 +00:00
andrewk
431e9c2c8b
Add dbSNP ID to VCF output records
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2536 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 15:30:04 +00:00
depristo
076481f786
Fixes to mergeVCF -- now correctly supports merging of filter fields. Also removed incorrect hasFilteringCodes() function. Updated intergration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2535 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 14:50:13 +00:00
rpoplin
cea544871d
Fixed an issue with recalibrating original quality scores above Q40. There is a new option -maxQ which sets the maximum quality score possible for when a RecalDatum tries to compute its quality score from the mismatch rate. The same option was added to AnalyzeCovariates to help with plotting q scores above Q40. Added an integration test which makes use of this new -maxQ option.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2534 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 13:50:30 +00:00
ebanks
6c739e30e0
1. Removing an old version of the Genotype interface which is no longer being used. Needed to do this now so that the naming conflicts would cease.
...
2. Adding a preliminary version of the new Genotype/Allele interface (putting it into refdata/ as the VariantContext really only applies to rods) with updates to VariantContext. This is by no means complete - further updates coming tomorrow.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2533 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 05:51:10 +00:00
depristo
a9245a58e2
Fix for incorrect exception throwing in VCFRecord. It is reasonable to ask for the non-ref allele freq at all ref sites. Was only passing in tests because isReference was broken
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2532 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 01:18:30 +00:00
depristo
7215526810
Fix to isReference() in VCFRecord. Change to VariantCounter to correctly counter only non-genotype variants, as well as update to VariantEvalWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2531 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 00:03:29 +00:00
andrewk
6c4ac9e663
Updated HapMap2VCF to use the VCFGenotypeWriterAdapter interface; fixed bug in VCFParameters that affects VariantsToVCF and HapMap2VCF when reference is lower-cased; added integration test for HapMap2VCF that checks for the lower-case issue by testing against Hg18 region that has lower-cased bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2530 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 21:27:11 +00:00
aaron
576594eda2
clean-up of the GATK paper genotyper, and better output formatting for the simple call format we emit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2529 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 20:54:56 +00:00
chartl
7e3e714d3c
Moving experimental annotations from core to oneoffs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2528 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 19:34:10 +00:00
chartl
a32245f7d2
Modifications:
...
QualityUtils - Stole the BaseUtils code for flipping reads around and applied it to quality scores
SecondBaseSkew - Nothing's really different, just a commented line
Additions (experimental annotations for future development of second-base annotation)
** I DO NOT INTEND FOR ANYONE TO USE THESE **
- ProportionOfNonrefBasesSupportingSNP
- ProportionOfSNPSecondBasesSupportingRef
- ProportionOfRefSecondBasesSupportingSNP
+ I hope these are self-explanatory
- QualityAdjustedSecondBaseLod
+ Adjust lod-score by 10*log10[P[second bases are as observed]]
Added walker:
QualityScoreByStrand - oneoff project that's being saved if i ever need it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2527 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 19:18:07 +00:00
asivache
eb899741e1
reverting last changes. no cacheing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2526 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 18:59:37 +00:00
asivache
a17d725c35
Cache pileup bases and mapping quals after first call to getBases() and getMappingQuals(), respectively. Subsequent calls to these method will return cached arrays.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2525 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 18:05:00 +00:00
ebanks
d6fb19bb67
Don't hard-code base qual max
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2524 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 17:21:44 +00:00
rpoplin
75809100c6
Use inheritance so that shared code isn't duplicated between the RecalDatums
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2523 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 16:45:16 +00:00
ebanks
fdd14e1a01
Proposed interface for VariantContext. It's currently an interface so it doesn't break the build...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2521 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 16:31:39 +00:00
rpoplin
e011a1b6f8
Cut the memory footprint of the RecalDatum in half to improve performance of CountCovariates when run with many covariates.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2520 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 16:12:27 +00:00
rpoplin
370a365147
Small runtime improvement in TableRecalibration.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2519 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:51:12 +00:00
ebanks
b745c2f8d7
Fix for Jared: don't blow up if there are no samples in the input (since that's allowed) - but warn the user just in case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2518 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:37:06 +00:00
depristo
1e462419da
trivial code restructuing, and commented out failed attempt to support sample selection with VCF. VariantEval2 go go go
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2516 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:04:27 +00:00
depristo
f857159343
useful convenience function to get a genotype associated with a particular sample
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2515 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:03:07 +00:00
depristo
34519b3e3b
Better printing support for false positives and false negatives in concordance tables
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2514 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:02:40 +00:00
depristo
592749a7c1
isNBase method
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2513 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:01:51 +00:00
depristo
5ce11c3dad
toString method
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2512 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 15:01:20 +00:00
rpoplin
1c90e6a954
More informative error message in AnalyzeCovariates and cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2511 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 14:56:29 +00:00
depristo
bca3d1b943
useful convenience function to get a genotype associated with a particular sample
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2510 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 14:53:56 +00:00
depristo
ec774f62be
Some checking to protect the BasicGenotype
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2509 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 14:53:24 +00:00
rpoplin
71ecbe75d7
AnalyzeCovariates would crash with 'too many open files' exception when spawning Rscript jobs for every read group at once. It now waits for some to finish before spawning the rest.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2508 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 14:19:02 +00:00
depristo
21a50eedb5
Simple extension to VariantEval: --includeFilteredRecords will now keep filtered VCF records so you can see what the entire call set looks like. Looking forward to VariantEval v2 from Kiran.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2506 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 12:59:09 +00:00
depristo
8d13597a27
Temporary command-line support to enable rod walkers, if you know what you are doing this is safe.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2505 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 12:15:36 +00:00
ebanks
d8351cb9fc
Give Annotations access to rod data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2504 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-05 18:53:01 +00:00
ebanks
8b087305f3
Added back the MQ0 annotation - however, it's not yet standard (since mq0 reads are filtered out by default in the genotyper). But it'll work when using the Annotator as a standalone.
...
While I'm at it, change getPileup to getBasePileup to remove all of the deprecation warnings.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2502 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-05 17:07:19 +00:00
hanna
a4b69d0adf
Misc bug fixes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2501 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-05 14:48:19 +00:00
depristo
c209ba55aa
More informative error message
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2499 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-05 13:55:20 +00:00
rpoplin
0a6bd5a270
CycleCovariate is now one-based so that 0 and -0 don't collide with each other. Solid recal modes now only change the inconsistent base and the previous base (along the direction of the read) instead of both the bases before and after. Removed estimatedNumberOfBins from the Covariate interface because it wasn't being used.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2498 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-04 20:52:15 +00:00
ebanks
ed2fff13aa
-Misc improvements to VCF code
...
-Small fix to callset concordance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2497 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-04 02:28:47 +00:00
hanna
29c129aced
Added very primitive read fishing walker with lots of hard coding. Fixed
...
bugs encountered when testing read fishing in Ecoli.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2496 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-04 00:54:57 +00:00
ebanks
7b702b086f
You don't need to be bi-allelic to have a non-ref alt allele frequnecy, but you do have to be a variant.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2495 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-03 22:02:39 +00:00
ebanks
b668d32cf1
Updated the min mapping quality and min base quality defaults to be 10 in both cases (and updated all integration tests) as suggested by Mark.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2494 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-03 21:31:04 +00:00
hanna
b6ecc9e151
Support for ad-hoc reference sequences. Also reenabled BWA/Java integration test, which was commented out
...
and the data backing it up deleted without my knowledge. Unfortunately, since the data was deleted, I had
to regenerate the data and a new md5. Hopefully the aligner output is still correct.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2493 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-02 20:19:14 +00:00
asivache
ad549eacfd
Now that we changed how deletions are represented, got to update MD5...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2491 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 22:00:58 +00:00
asivache
46362ce532
In extended event lines, now prints deletions in verbose format as well (e.g. "-AAT")
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2490 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:57:20 +00:00
asivache
a18e31f5b8
If alignment context at the locus holds extended event, get rod metadata and (importantly) reference bases for the whole span of the event (if it is a deletion that is, insertions still have length 0 on the ref!)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2489 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:56:25 +00:00
asivache
a41cb0701b
Now can generate verbose String representation of deletions (e.g. "-AAT") if reference bases are provided as an argument to getEventStringWithCounts().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2488 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:54:50 +00:00
asivache
89791d730e
Compute and cache the length of the longest deletion observed at the site; ReadBackedExtendedEventPileup now has a getter to access that value.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2487 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:19:39 +00:00
asivache
9c41ac252f
Disable testSingleBPFailure - getReferenceContext() now whould agree to accept length > 1 genome locs as its argument, so there's nothing to test...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2486 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:12:00 +00:00
asivache
8932e67325
Removed sanity check that required GenomeLoc argument to be strictly 1-base long. We need to relax this in order to be able to pass around a reference context containing full-length chunk of deleted reference bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2485 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 20:14:08 +00:00
hanna
497ae700c4
A rethink of the existing BAM block extraction code: rather than working in
...
chunk space directly, stream data in block space, converting to chunk space
on demand.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2484 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 18:19:51 +00:00
rpoplin
80658fd99e
AnalyzeCovariates gets the same performance improvements as the recalibrator. NHashMap class is removed completely.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2483 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 18:10:10 +00:00
rpoplin
9b2733a54a
Misc clean up in the recalibrator related to the nested hash map implementation. CountCovariates no longer creates the full flattened set of keys and iterates over them. The output csv file is in sorted order by default now but there is a new option -unsorted which can be used to save a little bit of run time.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2482 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 16:58:04 +00:00
asivache
4aeb50c87d
Added: integration test for extended pileup (with indels included)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2481 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 23:02:23 +00:00
asivache
c928347c0c
Extended event pileups are more verbose now: following a sequence of 'D','I', and '.' symbols, actual distinct events are listed along with their counts (example: +AAA:3,+AAC:1 for the total of 4 indel observations with 3 reads showing +AAA and one read showing +AAC)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2480 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 22:44:18 +00:00
asivache
8330058216
method added: getEventStringsWithCounts()
...
Returns list of Pairs <String,Integer>, where each pair consists of a unique indel event observed at the site and the total number of observations of that event. String representation for insertions is verbose (e.g. +ACT), while deletions are represented as "5D" (since read backed pileup has no reference information, so we can not get actual sequence of deleted bases)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2479 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 22:41:58 +00:00
asivache
cf3e59eb4a
back to archive
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2478 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 22:00:38 +00:00
asivache
295d16572e
synch; will go back to archive in a sec
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2477 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 22:00:03 +00:00
asivache
e286313b67
Fix for reads that have insertion as their last (mapped) cigar elements (i.e. not followed by M)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2476 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 21:13:16 +00:00
hanna
05deb8796b
Simplify handling of reference sequence for unmapped reads. Improvement made based on a suggestion from Alec.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2475 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 21:06:20 +00:00
rpoplin
96c4929b3c
Recalibrator now uses NestedHashMap instead of NHashMap. The keys are now nested hash maps instead of Lists of Comparables. These results in a big speed up (thanks Tim!). There is still a little bit of clean up to do, but everything works now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2474 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 21:01:32 +00:00
depristo
7826e144a1
forgot to update md5s
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2473 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 20:31:29 +00:00
asivache
bfd6bf9ec5
PileupWalker just got a new option: --showIndelPileups. When this option is used, two lines are printed for every genomic location that has indels associated with it: first line is a conventional base pileup, the second line is an "extended event" (indel) pileup. The refence base in that second line is always set to "E" (for Extended), and the pileup string contains I,D,. symbols for insertion, deletion, noevent, respectively. Only this simple short format for indel pileups is implemented so far.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2472 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 20:16:34 +00:00
asivache
9652692019
Modified to enable locus traversals firing additional calls to walker's map() with alignment context filled with extended events (indels). Walker should override generateExtendedEvents() to return true, and it should make sure that it catches those additional indel pileups and processes them differently, as needed. If there are indels associated with a specific reference base, TWO map() calls will be issued in locus traversal at that location: first one will have a context filled with a regular base pileup, the second call will provide the context filled with indel pileup (pileup elements will have insertion, deletion, or noevent type associated with them and will also carry information about the full length of the event and inserted bases).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2471 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 20:13:25 +00:00
asivache
06eb576924
Can now be constructed with either base pileup or extended event (indel) pileup; has query methods checking what kind of pileup is served by the context, and getter methods return the appropriate pileup. TODO: while it is impossible right now to create a context that contains both types of pileups simultaneously, this restriction is only weakly enforced through the lack of appropriate constructor. Either we keep it this way, or some getters may become ambiguous and have to be fixed!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2470 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 20:07:29 +00:00
asivache
f445745c56
Pileup element and corresponding container class tweaked for representing pileups of extended events (indels) at a given locus. There's some redundancy with PileupElement and ReadBackedPileup (should we rename them to BasePileupElement and ReadBackedBasePileup?), so that abstracting a basic interface/abstract base from these classes can be considered in the future
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2469 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 20:03:39 +00:00
depristo
87e863b48d
Removed used routines in duputils; duplicatequals to archive; docs for new duplicate traversal code; general code cleanup; bug fixes for combineduplicates; integration tests for combine duplicates walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2468 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 19:46:29 +00:00
depristo
29f94119d1
Fixes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2466 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 18:08:41 +00:00
ebanks
5fdf17fccb
Removed the VCF "NS" annotation (which wasn't working for pooled calls anyways) since it's ambiguous and not useful.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2465 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 17:30:47 +00:00
hanna
e32174fbc4
UnifiedGenotyper now works without -varout or -vf set.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2464 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 16:46:24 +00:00
hanna
b125571a98
Intermediate check in: transfer responsibility of wrapping the GenotypeWriter around the output stream to the output
...
management code. Currently, will not work when neither -varout nor -vf are specified, but should work in all other
cases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2463 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 16:11:11 +00:00
ebanks
aeb34758e6
Adding a validation stringency to the VCF writers (which defaults to STRICT). If set to SILENT, it will not throw an exception for (reasonable) off-spec requests but will instead ignore such requests and silently move on.
...
This change allows the pooled calculation model to work correctly with multiple threads. Boys, the Genotyper is now officially parallelized.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2462 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 15:33:53 +00:00
rpoplin
29a3d9b47a
AnalyzeCovariates also has to skip over NO_DINUC
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2461 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 14:36:05 +00:00
aaron
a34c2442c0
moved hard-coded file paths to the oneKGLocation, validationDataLocation, and seqLocation variables setup in the BaseTest.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2460 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 07:40:48 +00:00
depristo
9d263b2565
Integration tests for count duplicates walker validated on a TCGA hybrid capture lane.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2459 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 23:57:25 +00:00
depristo
fcc80e8632
Completely rewritten duplicate traversal, more free of bugs, with integration tests for count duplicates walker validated on a TCGA hybrid capture lane.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2458 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 23:56:49 +00:00
hanna
4617052b3c
For Alec, and others at the Broad who want to run our unit/integration tests off of gsa1/gsa2: put a ceiling on the amount of memory that integration tests can use. Reduce the memory footprint of the fasta reader test.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2457 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 23:42:46 +00:00
alecw
b5e5e27225
New versions of picard-private, sam and picard jars for TileCovariate and regeneration of NM tag
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2456 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 22:18:55 +00:00
hanna
d4ee999ef9
Creates files supplemental to the reference sequence, consumed by BWA.
...
ANN - Alternate form of the sequence dictionary. Should be created from a sequence dictionary with full contig names.
AMB - A map of 'holes' in the genome, aka runs of non-ACGTacgt bases. This skeletal implementation always reports no
holes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2455 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 21:40:44 +00:00
rpoplin
fcc52fbcd1
Fixed the build. Added missing import line.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2454 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 21:26:00 +00:00
ebanks
893c9c85fa
Added previous optimization to diploid (non-pool) model and shaved off 20% of runtime from it. Moved out some common functionality to joint estimate parent class.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2453 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 21:20:48 +00:00
rpoplin
92e3682991
Moved NHashMap to sting/utils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2452 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 20:57:32 +00:00
rpoplin
562db45fa5
Sites that were marked NO_DINUC no longer get dinuc-corrected but are still recalibrated using the other available covariates. Solid cycle is now the same as Illumina cycle pending an analysis that looks at the effect of PrimerRoundCovariate. Solid color space methods cleaned up to reduce number of calls to read.getAttribute(). Polished NHashMap sort method in preparation for move to core/utils. Added additional plots in AnalyzeCovariates to look at reported quality as a function of the covariate.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2451 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 20:19:37 +00:00
asivache
2a704e83df
Reads now have new traversal flag: generateExtendedEvents(). Support added to GenomeAnalysisEngine and Walker. This is a silent and transparent framework change that no existing code is going to see. The actual code that makes use of the new flag (which is false by default) will be committed separately...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2450 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 19:52:44 +00:00
ebanks
c8d0e6e004
Optimization to pooled calculation model: stop calculating P(D|AF) if we are beyond the max likelihood such that subsequent likelihoods won't factor into the confidence score. Also, use new Pileup interface.
...
Pooled calling now takes less than half the time it used to.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2449 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 18:39:55 +00:00
ebanks
b1ac4b81d5
Optimization: look up diploid genotypes from a static matrix instead of creating them on the fly (with String.format); bases no longer need to be ordered appropriately
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2448 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 17:28:51 +00:00
andrewk
57516582c2
Converter from HapMap chip genotype data to VCF added; HapMapGenotypeROD adjusted to not convert from Hg18 to b36 formatting of contigs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2447 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 01:36:08 +00:00
ebanks
d2770f380c
Writing calls to standard out now works again (it got broken when we introduced parallelization)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2446 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-27 04:36:45 +00:00
ebanks
12990c5e7a
Added qual-by-depth annotation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2445 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-25 02:30:30 +00:00
ebanks
0571d9dcb9
Point MAX_QUAL_SCORE to SAMUtils.MAX_PHRED_SCORE.
...
Also, array size for caches should be max score + 1.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2444 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 20:47:32 +00:00
ebanks
438d21842a
The new recalibrator had been mimicking the behavior of the old one in that if there was no dinuc available (following a no-call base or at either end of a read), it didn't try to recalibrate. Now that Ryan has modularized the system, we no longer need to skip the base completely (we just need to skip the dinuc value)... which is good because the Picard people complained after realizing that cycle #1 never got recalibrated.
...
The major effects of this commit are as follows:
1. We no longer skip any good bases (of course, this change alone breaks every single integration test).
2. The dinuc covariate returns a "no dinuc" value for the first base of a read (but not for the last base anymore, since there is a valid dinuc) or if the previous base is a bad base (e.g. 'N').
I've done a bunch of testing on real data and everything looks right; however, let's wait until the recalibrator guru gets back from vacation next week and can double-check everything before shipping this out in another early access release.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2443 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 20:41:29 +00:00
ebanks
aaf674d9db
Cleaned up this annotation.
...
Still experimental. As of now, it's not useful. More analysis is needed to determine how to handle cases where UG is unsure whether a sample is het or hom.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2442 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 03:06:46 +00:00
ebanks
6df40876a3
Un-reverted Matt's previous changes and fixed integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2441 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 02:47:00 +00:00
hanna
2bd0b1bbf7
After further review, it's unclear that my patch in RecalDataManager was the right choice. Reverting.
...
Also updating other IntervalCleanerIntegrationTest failures that were masked by my first patch.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2440 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 00:32:33 +00:00
hanna
98c268483e
Fixed issues with the integration tests:
...
1) sam-jdk apparently no longer supports custom tags with type int[] values.
2) BAM output for indel cleaner integration test changed in a way that's so subtle it can't be seen after converting the output to .sam.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2439 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 23:12:22 +00:00
aaron
b134e0052f
added changes to the code to allow different types of interval merging,
...
1: all overlapping and abutting intervals merged (ALL),
2: just overlapping, not abutting intervals (OVERLAPPING_ONLY),
3: no merging (NONE). This option is not currently allowed, it will throw an exception. Once we're more certain that unmerged lists are going to work in all cases in the GATK, we'll enable that.
The command line option is --interval_merging or -im
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2437 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 21:59:14 +00:00
alecw
159778416c
In TableRecalibrationWalker, update UQ tag if it was present in the original SAMRecord. This required a new sam.jar, which caused some other files to need to be changed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2435 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 21:42:36 +00:00
hanna
87ff2b15d4
First step in introducing a patch to Picard: create our ideal interface into the BAM file for sharding.
...
This commit can iterate over the BAM file, pulling out information about the blocks in the file without actually loading
or decompressing the reads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2434 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 21:35:08 +00:00
ebanks
770093a40e
Oops - forgot to check this one in.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2433 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 19:53:28 +00:00
ebanks
dc96879861
2 separate changes which both affect lots of UG integration md5s, so I'm committing them together:
...
1. allele balance annotation is now weighted by genotype quality (so we don't get misled by borderline het calls)
2. Updates to the Unified Genotyper for parallelization:
a. verbose writing now works again; arg was moved from UAC to UG
b. UG checks for command that don't work with parallelization
c. some cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2432 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 19:03:56 +00:00
ebanks
872a9d1c7b
I'm making this change now (as opposed to waiting until Monday) to honor Tim's request.
...
The cycle covariate is now first/second of pair aware. I'm taking it on faith from both Chris Hartl (waiting on slides from him) and Tim that this is the right thing to do. We'll have Ryan confirm it all next week.
The only change is that if a read is the second of a pair, we multiple the cycle by -1 (a simple way of separating its index from that of its mate).
Of course, this broke all integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2431 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 16:26:43 +00:00
hanna
e29e8e52b9
Multithreading support for the unified genotyper. Tests on a 10Mbase region on pilot 1 show a 6.8x improvement
...
when running 8 ways parallel.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2430 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 00:48:06 +00:00
kiran
164a94a3d0
Modified the walker documentation so that the stray punctuation wouldn't cause the GATK to stop parsing the help documenation early (aka I changed one word).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2429 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-22 20:50:01 +00:00
kiran
4ee6a478e3
Creates a table of reference allele percentage and alternate allele percentage at Hapmap-chip sites in a BAM file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2428 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-22 20:43:44 +00:00
ebanks
03bf75e335
Now implements TreeReducible
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2427 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-22 17:52:51 +00:00
hanna
0d890e1bf0
Rework Eric's output management code given that the behavior of the UG changes drastically
...
depending on its output format. Current implementation is probably a bit overkill-ish and
we can whittle this down to what's absolutely necessary.
Writing VCFs to the 'out' protected printstream may not work at this moment.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2425 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-22 00:33:43 +00:00
ebanks
f448a263e9
The cleaner now cleans duplicate reads (instead of ignoring them) - although it doesn't include them for scoring ref or alt consenses
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2424 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 21:01:55 +00:00
ebanks
cf303810d3
VCF reader now creates the correct type of header line for each header type
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2423 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 20:39:06 +00:00
ebanks
e06dfe44c4
Check for null platform (even when the read group isn't null) and assign it the default platform if it is
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2420 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 07:01:41 +00:00
ebanks
87e5a41964
Fixed a bug that accounted for a bunch of my remaining mis-cleaned indels.
...
Also, slightly optimized the cleaner by using readBases (instead of readString) and caching cigar element lengths.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2419 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 05:46:16 +00:00
hanna
b780ffb34a
Add a getFormat() method to get the output format from the writer. The need for
...
this call suggests that I may be thinking about the typing of the GenotypeWriter object the wrong way.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2418 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 01:46:26 +00:00
hanna
11cbfcec9c
Get rid of backlink from ArgumentDefinitions to ArgumentSources. This will help in the future with multiple
...
source -> single definition mapping sets.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2417 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 00:39:36 +00:00
hanna
9e53c06328
First revision of command-line argument support for GenotypeWriter. Also, fixed the damn build.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2416 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-20 19:19:23 +00:00
ebanks
4ff61097cf
Trivial change: < -> <=
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2415 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-20 03:35:27 +00:00
ebanks
566b556b50
Give user ability to turn off max allowed interval size
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2414 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-20 03:20:22 +00:00
ebanks
a5f75cbfd4
The previous commit broke the build, so this is a temporary patch to get it to compile. ConcordanceTruthTable should use enums (esp. now that all of the concordance variables need to be public), but VariantEval will need to be rewritten soon anyways so I'll just push it off until then.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2413 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-20 02:34:41 +00:00
depristo
ee8bcdc61d
PooledConcordance calculations have been reformatted and bugs fixed. Now properly handles monomorphic sites. Also works with -G option now, correctly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2412 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-19 23:22:36 +00:00
depristo
9bf2d12c64
Misc. improvements to the LMW code. Support for emitting all sites, regardless of genotype. Min and max quality scores.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2411 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-19 23:20:57 +00:00
aaron
7e0f69dab5
Changed the GLF record to store it's contig name and position in each record instead of in the Reader. Integration tests all stay the same.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2410 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 22:54:56 +00:00
hanna
80b3eb85fa
Fixed curiously epic failure in read-backed pileup: size() mismatched the numReads-numDeletions at that locus in the case where includeReadsWithDeletionsAtLoci == false, causing failures including bad output from pileup walker. Also fixed up ValidatingPileup to run with the new ReadBackedPileup instead of just compiling successfully.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2409 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 22:52:44 +00:00
rpoplin
fdf542c214
The CycleCovariate for 454 data is now the TACG flow cycle. That is, each flow grabs all the T's, A's, C's, and G's in order in a single cycle. This is changed from incrementing the cycle whenever there is a discontinuous nucleotide along the direction of the read.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2408 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 22:39:51 +00:00
aaron
c39675d2c1
VCFTool.java got left off of the last commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2407 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 21:33:53 +00:00
ebanks
4ea31fd949
Pushed header initialization out of the GenotypeWriter constructors and into a writeHeader method, in preparation for parallelization.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2406 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 19:16:41 +00:00
ebanks
eeddf0d08e
Adding sample utils for convenience methods to pull out samples from e.g. SAMFileHeader or Genotype objects
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2405 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 18:51:21 +00:00
chartl
79b997f43d
Minor fix to getValue (thanks Ryan!)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2404 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 15:45:51 +00:00
aaron
9971a8da9a
adding a check to the RodVCF to ensure that records are in-order in the underlying VCF file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2403 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 15:24:45 +00:00
chartl
38563bbc2d
The values used to be integers (-1 for unpaired, 0 for unmapped, 1 for first, 2 for second); but i switched to strings before commit so it was more clear. Forgot to update the OTHER getValue method.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2402 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 15:05:14 +00:00
chartl
7b5e332ff3
Added - PairedQualityScoreCountsWalker: counts quality scores (e.g. as a histogram) on first reads of a pair and second reads of a pair. Turns out there's a consistent difference in quality scores; even after recalibrating without the pair ordering as a covariate (there's a bit of averaging -- but not as much as I initially thought).
...
Added - A paired read order covariate to use with recalibration. Currently experimental: for instance, what's a proper pair versus just a pair? Nobody should use this one...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2401 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 15:01:01 +00:00
ebanks
4f59bfd513
Updates to the various GenotypeWriters to make them do simple things like write records (plus allow GLFReader to close).
...
Adding first pass of stub and storage classes for the GenotypeWriters so that UG can be parallelizable. Not hooked up yet, so UG is unchanged.
The mergeInto() code in the storage class is ugly, but it's all Tribble's fault. We can clean it up later if this whole thing works.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2400 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 07:20:23 +00:00
ebanks
1cde4161b7
Fixed another test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2399 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 05:05:03 +00:00
ebanks
94f5edb68a
1. Fixed VCFGenotypeRecord bug (it needs to emit fields in the order specified by the GenotypeFormatString)
...
2. isNoCall() added to Genotype interface so that we can distinguish between ref and no calls (all we had before was isVariant())
3. Added Hardy-Weinberg annotation; still experimental - not working yet so don't use it.
4. Move 'output type' argument out of the UnifiedArgumentCollection and into the UnifiedGenotyper, in preparation for parallelization.
5. Improved some of the UG integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2398 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 04:14:14 +00:00
jmaguire
98839193b7
compatibility with VCF lib's switch to GenomeLoc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2397 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 00:52:48 +00:00
jmaguire
8787dd4c5e
Various and sundry additions to VCF tools. Some useful to the general public, some one-offs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2396 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 00:35:45 +00:00
rpoplin
6fbf77be95
Updating the two solid_recal_mode options to also change the previous base since solid aligner prefers single color mismatch alignments over true SNP alignments. COUNT_AS_MISMATCH mode has been removed completely. The default mode is now SET_Q_ZERO.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2394 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 20:07:26 +00:00
hanna
07f1859290
Added integration test for running the recalibrator with no index.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2393 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 19:10:53 +00:00
ebanks
c75ec67f84
When called as a standalone, VariantAnnotator now emits samples in sorted (as opposed to random) order in VCFs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2392 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 19:01:08 +00:00
rpoplin
aa86f3710d
Updating HomopolymerCovariate to only count the consecutive previous bases. I left in the code but commented out for if somebody wants to worry about carry forward homopolymer problems.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2391 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 18:25:09 +00:00
hanna
b863fffdf6
Fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2390 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 17:55:00 +00:00
hanna
9143822822
Fix half-hearted attempt to try to move classes from package to package.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2389 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 17:41:42 +00:00
asivache
e6cc7dab26
fixing md5 sum; new version of IndelIntervalWalker does the right thing...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2388 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 01:04:13 +00:00
asivache
acb4d477da
sync...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2387 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 01:03:01 +00:00
asivache
ba86508854
remove debug print command
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2386 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 00:00:01 +00:00
asivache
d72d332239
1) changed to search specifically for D and I cigar elements (and to process properly/ignore H,S,P elements) and print out only intervals that encompass actual indels. There's still one interval per read (at most) generated, which is the smallest intervals that covers ALL indels (D or I elements) present in the read; 2) if an interval (thus the original read itself and indels in it) sticks beyond the end of the chromosome, the read is ignored and this interval is NOT printed into the output; instead, a warning is printed to STDOUT (should we send it to logger.warn() instead?
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2385 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 23:29:07 +00:00
hanna
5b78354efd
Fixed NPE in index check with RefWalkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2384 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 22:37:45 +00:00
hanna
e6127cd6c5
Temporary hack for Tim Fennell: introduce a sharding strategy that stuffs all data into a single
...
shard for cases when the index file isn't available. Works for the case in question, but is not
guaranteed to work in general. Will be replaced once the new sharding system comes online.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2383 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:55:42 +00:00
ebanks
bef1c50b3b
Some cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2382 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:41:06 +00:00
ebanks
bb92e31118
Optimizations:
...
1. push the ReadBackedPileup filtering up into the ReadFilters for read-based filters
2. stop querying the cigar for its length (just do it once)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2381 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:39:58 +00:00
andrewk
36875fca89
Update documentation in the new help system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2380 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:33:12 +00:00
hanna
ee47eb4367
Make filters used available to the walker via getToolkit().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2379 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:26:04 +00:00
ebanks
b626fc0684
Joint Estimate is now the default calculation model.
...
Reworked all of the integration tests so that they're now more comprehensive, cover more of what we wan to test, and don't take forever to run.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2376 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 19:41:02 +00:00
ebanks
e051311e8c
Added convenience methods in RodVCF to pull out all of the VCF data from the VCFRecord (e.g. getID(), getSamples(), getInfoValues())
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2374 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 17:58:41 +00:00
ebanks
bb312814a2
UG is now officially in the business of making good SNP calls (as opposed to being hyper-aggressive in its calls and expecting the end-user to filter).
...
Bad/suspicious bases/reads (high mismatch rate, low MQ, low BQ, bad mates) are now filtered out by default (and not used for the annotations either), although this can all be turned off.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2373 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 17:28:09 +00:00
aaron
af440943a4
Fixing a bug that Steven uncovered; we had an abigous contract for peek() in PushbackIterator, and SeekableRODIterator wasn't checking to see if it's PushbackIterator hasNext() was true before calling peek().
...
Changed peek() to element() to be consistant with the Java standards of the Queue and Stack classes (element() throws an exception if a record isn't available).
Also updated some of the ROD iterator next() methods to throw NoSuchElementException if next() is called when a record isn't available.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2372 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 23:04:40 +00:00
andrewk
1035abc85f
Add minimum base quality thresholding to depth of coverage via getBaseAndMappingFilteredPileup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2371 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 22:58:30 +00:00