chartl
5b2a1e483e
Renamed SequenomToVCF as PlinkToVCF. Wiki will be changed accordingly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2649 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 17:35:20 +00:00
ebanks
9c7b281b4f
Set default value for max_coverage to be 100K (since 10K is too small).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2646 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 20:15:25 +00:00
aaron
8d1d37302c
a quick change to GLF to keep as much precision in our likelihoods as long as possible, before we put it into byte space. Sanger was doing a diff at low coverage and noticed our calls didn't contain as much precision as theirs. Updated the MD5 for unified genotyper output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2644 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 19:36:49 +00:00
aaron
a1b4cc4baf
changes to intelligently log overflowing locus pile-ups.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2640 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 08:09:48 +00:00
ebanks
4ac9eb7cb2
- Smarter strand bias calculation
...
- Better debug/verbose printing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2639 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 03:01:26 +00:00
depristo
9e0ae993c7
-B 1kg_ceu,VFC,CEU.vcf -B 1kg_yri,VCF,YRI.vcf system supported to allow 1KG % (like dbSNP%)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2632 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 21:33:13 +00:00
rpoplin
c98df0a862
Updated solid_recal_modes to work with bfast aligned data. Added an integration test that uses the BFAST file provided by TGen.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2630 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 21:18:02 +00:00
ebanks
12453fa163
Misc cleanup of UG args
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2620 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-17 04:38:52 +00:00
depristo
d8e74c5795
Update to MD5s for old tests and added extensive VCF testing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2615 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 20:22:58 +00:00
ebanks
b911b7df82
Fixing the AC annotation to be in line with the VCF spec
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2593 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 18:28:52 +00:00
rpoplin
f2e539c52f
As per discussions with Tim we are reverting the previous change regarding PairedReadOrderCovariate. The CycleCovariate now differentiates between first and second of pair by multiplying the cycle by -1. PairedReadOrderCovariate has been removed completely.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2592 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 18:18:59 +00:00
rpoplin
df998041a8
Minor change to solid warning message. Added note for a future solid recalibration integration test when we get the required data file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2590 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 16:31:25 +00:00
hanna
b19bb19f3d
First successful test of new sharding system prototype. Can traverse over reads from a single
...
BAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2587 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 03:35:55 +00:00
aaron
db9570ae29
Looks bigger than it is:
...
* Moved GATKArgumentCollection into gatk.arguments folder to clean up the main folder, also added some associated argument classes (most of the changes).
* Added code the argument parsing system for default enums, we needed this so we could preserve the current unsafe flag, and at the same time allow finer grained control of unsafe operations. You can now specify:
"-U" (for all unsafe operations), "-U ALLOW_UNINDEXED_BAM" (only allow unindexed BAMs), "-U NO_READ_ORDER_VERIFICATION", etc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2586 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 00:14:35 +00:00
asivache
cff8b705c0
Oh, and the test would not work anymore...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2585 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:47:09 +00:00
rpoplin
9bf0d7250a
Fixing the testOtherOutput UG integration test so it will run.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2580 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 13:40:14 +00:00
chartl
424d1b57f7
Sequenom to VCF now allows user to specify filters for QC, and they will appear in the filter field of the output VCF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2577 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 23:22:37 +00:00
rpoplin
f96b2b211e
My last checkin updating R code broke an unrelated UnifiedGenotyper integration test. Eric says that I should take out the verbose test.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2576 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 22:28:10 +00:00
rpoplin
49c44e7b36
PairedReadOrderCovariate is now a standard covariate and because of this CycleCovariate no longer multiplies by negative one for second of pair reads. Added PairedReadOrderCovariate to some of the integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2574 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 20:09:10 +00:00
ebanks
8ca5bba738
We emit genotype data in the VCF record if the format string instructs us to (regardless of whether or not genotypes are provided - this was the wrong test).
...
SequenomToVCF now correctly has no-calls when probes fail.
Re-enabled SequenomToVCF integration test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2572 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 15:40:27 +00:00
chartl
6d1107a4ed
Update to SequenomToVCF
...
Output changing slightly so integration test disabled temporarily
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2571 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 15:32:05 +00:00
ebanks
f99586f91b
Added integration test for beagle and verbose output in UG.
...
Minor cleanup of VCFRecord code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2570 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 03:55:24 +00:00
rpoplin
189829841b
The recalibrator now uses all input RODs when looking for known polymorphic sites not just the one named dbsnp. Added an integration test which uses both dbsnp and an input vcf file and skips over the union of the two.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2564 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:50:39 +00:00
aaron
16777e3875
more fixes for the empty interval list problem; you can now run LocusWindow traversals with an empty interval list, but the GATK will give you a warning (unless you're running in unsafe mode).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2563 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:47:43 +00:00
ebanks
03b7d5f5c7
1. Fixed small but embarrassing bug in weighted Allele Balance annotation calculation.
...
2. Made RankSumTest abstract; added 2 subclasses: BaseQualityRST and MappingQualityRST (the latter based on a suggestion from Mark Daly). Untested so they're still experimental.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2561 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:33:53 +00:00
ebanks
040fdfee61
Cleaned up the interface to VCFRecord. It's now possible (and easy) to create records and then write them with a VCFWriter.
...
I've updated HapMap2VCF to use the new interface; Chris agreed to take care of Sequenom2VCF.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2558 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 21:42:12 +00:00
ebanks
42aff1d2c3
Annotator in general should be able to annotate monomorphic or tri-allelic sites.
...
It's up to the individual annotations to decide whether they want to annotate or not.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2556 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 19:52:18 +00:00
chartl
dfa3c3b875
Added:
...
SequenomToVCF - Takes a sequenom ped file and converts it to a VCF file with the proper metrics for QC. It's currently a rough draft,
but is working as expected on a test ped file, which is included as an integration test.
Modified:
VCFGenotypeCall -- added a cloneCall() method that returns a clone of the call
Hapmap2VCF -- removed a VCFGenotypeCall object that gets instantiated and modified but never used
(caused me all kinds of confusion when I was basing SequenomToVCF off of it)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2554 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 17:17:21 +00:00
rpoplin
62dd2fa5be
Fixing another bug in solid recal regarding negative strand reads. The isInconsistentColorSpace method incorrectly used the inconsistent tag added by parseColorSpace, the inconsistent tag is in the direction of the read like the color space tag, and not in the direction of the reference like everything else. This affects the recalibrated quality scores but the improvment in SNP calling performance is minor when using the default UG settings (min base quality 10).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2553 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 14:28:52 +00:00
rpoplin
9cbae53ee1
Bug fixes for both SET_Q_ZERO and REMOVE_REF_BIAS solid recal modes regarding proper handling of negative strand reads. These changes yield a minor improvment in HapMap sensitivity.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2548 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 15:19:22 +00:00
ebanks
dfcd5ce25b
Fixed broken test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2547 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 06:13:01 +00:00
ebanks
d5ab002449
Curiously, it seems I never set the default base quality used by the Genotyper to 10. It's done now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2546 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 06:02:01 +00:00
ebanks
b468369dfa
-UG's call into VariantAnnotator now uses the full alignment context (as opposed to the filtered one)
...
-MQ0 annotation is now standard again
-Added AC and AN annotations to VCF output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2545 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 05:40:42 +00:00
rpoplin
f587ff46af
Tile is now a standard covariate. By default the TileCovariate returns -1 if tile can't be derived from the read's name. Added a new command line option -throwTileException which will force TileCovariate to throw an exception if tile can't be derived for a read. Singleton covariates, such as any read group without tile info, must be skipped over in TableRecalibration so that the sequential formulation doesn't apply the same correction more than once. TileCovariate class has been added to the Early Access package.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2544 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 22:51:41 +00:00
rpoplin
5f58492401
A rogue QualityUtils.MAX_REASONABLE_Q_SCORE managed to get through my previous bug fix. It should instead check the command line -maxQ argument.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2540 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 21:17:39 +00:00
ebanks
c7a8dffa89
Check for division by 0 in annotations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2539 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 19:27:15 +00:00
ebanks
b643a513bb
Minor interface change for VCFGenotypeRecord.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2537 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 16:48:09 +00:00
depristo
076481f786
Fixes to mergeVCF -- now correctly supports merging of filter fields. Also removed incorrect hasFilteringCodes() function. Updated intergration tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2535 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 14:50:13 +00:00
rpoplin
cea544871d
Fixed an issue with recalibrating original quality scores above Q40. There is a new option -maxQ which sets the maximum quality score possible for when a RecalDatum tries to compute its quality score from the mismatch rate. The same option was added to AnalyzeCovariates to help with plotting q scores above Q40. Added an integration test which makes use of this new -maxQ option.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2534 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 13:50:30 +00:00
ebanks
6c739e30e0
1. Removing an old version of the Genotype interface which is no longer being used. Needed to do this now so that the naming conflicts would cease.
...
2. Adding a preliminary version of the new Genotype/Allele interface (putting it into refdata/ as the VariantContext really only applies to rods) with updates to VariantContext. This is by no means complete - further updates coming tomorrow.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2533 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 05:51:10 +00:00
depristo
7215526810
Fix to isReference() in VCFRecord. Change to VariantCounter to correctly counter only non-genotype variants, as well as update to VariantEvalWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2531 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 00:03:29 +00:00
andrewk
6c4ac9e663
Updated HapMap2VCF to use the VCFGenotypeWriterAdapter interface; fixed bug in VCFParameters that affects VariantsToVCF and HapMap2VCF when reference is lower-cased; added integration test for HapMap2VCF that checks for the lower-case issue by testing against Hg18 region that has lower-cased bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2530 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 21:27:11 +00:00
depristo
8d13597a27
Temporary command-line support to enable rod walkers, if you know what you are doing this is safe.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2505 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 12:15:36 +00:00
rpoplin
0a6bd5a270
CycleCovariate is now one-based so that 0 and -0 don't collide with each other. Solid recal modes now only change the inconsistent base and the previous base (along the direction of the read) instead of both the bases before and after. Removed estimatedNumberOfBins from the Covariate interface because it wasn't being used.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2498 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-04 20:52:15 +00:00
ebanks
ed2fff13aa
-Misc improvements to VCF code
...
-Small fix to callset concordance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2497 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-04 02:28:47 +00:00
ebanks
b668d32cf1
Updated the min mapping quality and min base quality defaults to be 10 in both cases (and updated all integration tests) as suggested by Mark.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2494 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-03 21:31:04 +00:00
hanna
b6ecc9e151
Support for ad-hoc reference sequences. Also reenabled BWA/Java integration test, which was commented out
...
and the data backing it up deleted without my knowledge. Unfortunately, since the data was deleted, I had
to regenerate the data and a new md5. Hopefully the aligner output is still correct.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2493 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-02 20:19:14 +00:00
asivache
ad549eacfd
Now that we changed how deletions are represented, got to update MD5...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2491 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 22:00:58 +00:00
asivache
9c41ac252f
Disable testSingleBPFailure - getReferenceContext() now whould agree to accept length > 1 genome locs as its argument, so there's nothing to test...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2486 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:12:00 +00:00
asivache
4aeb50c87d
Added: integration test for extended pileup (with indels included)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2481 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 23:02:23 +00:00
rpoplin
96c4929b3c
Recalibrator now uses NestedHashMap instead of NHashMap. The keys are now nested hash maps instead of Lists of Comparables. These results in a big speed up (thanks Tim!). There is still a little bit of clean up to do, but everything works now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2474 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 21:01:32 +00:00
depristo
7826e144a1
forgot to update md5s
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2473 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 20:31:29 +00:00
depristo
87e863b48d
Removed used routines in duputils; duplicatequals to archive; docs for new duplicate traversal code; general code cleanup; bug fixes for combineduplicates; integration tests for combine duplicates walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2468 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 19:46:29 +00:00
ebanks
5fdf17fccb
Removed the VCF "NS" annotation (which wasn't working for pooled calls anyways) since it's ambiguous and not useful.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2465 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 17:30:47 +00:00
aaron
a34c2442c0
moved hard-coded file paths to the oneKGLocation, validationDataLocation, and seqLocation variables setup in the BaseTest.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2460 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 07:40:48 +00:00
depristo
9d263b2565
Integration tests for count duplicates walker validated on a TCGA hybrid capture lane.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2459 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 23:57:25 +00:00
depristo
fcc80e8632
Completely rewritten duplicate traversal, more free of bugs, with integration tests for count duplicates walker validated on a TCGA hybrid capture lane.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2458 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 23:56:49 +00:00
hanna
4617052b3c
For Alec, and others at the Broad who want to run our unit/integration tests off of gsa1/gsa2: put a ceiling on the amount of memory that integration tests can use. Reduce the memory footprint of the fasta reader test.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2457 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 23:42:46 +00:00
alecw
b5e5e27225
New versions of picard-private, sam and picard jars for TileCovariate and regeneration of NM tag
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2456 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 22:18:55 +00:00
rpoplin
562db45fa5
Sites that were marked NO_DINUC no longer get dinuc-corrected but are still recalibrated using the other available covariates. Solid cycle is now the same as Illumina cycle pending an analysis that looks at the effect of PrimerRoundCovariate. Solid color space methods cleaned up to reduce number of calls to read.getAttribute(). Polished NHashMap sort method in preparation for move to core/utils. Added additional plots in AnalyzeCovariates to look at reported quality as a function of the covariate.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2451 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 20:19:37 +00:00
ebanks
12990c5e7a
Added qual-by-depth annotation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2445 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-25 02:30:30 +00:00
ebanks
438d21842a
The new recalibrator had been mimicking the behavior of the old one in that if there was no dinuc available (following a no-call base or at either end of a read), it didn't try to recalibrate. Now that Ryan has modularized the system, we no longer need to skip the base completely (we just need to skip the dinuc value)... which is good because the Picard people complained after realizing that cycle #1 never got recalibrated.
...
The major effects of this commit are as follows:
1. We no longer skip any good bases (of course, this change alone breaks every single integration test).
2. The dinuc covariate returns a "no dinuc" value for the first base of a read (but not for the last base anymore, since there is a valid dinuc) or if the previous base is a bad base (e.g. 'N').
I've done a bunch of testing on real data and everything looks right; however, let's wait until the recalibrator guru gets back from vacation next week and can double-check everything before shipping this out in another early access release.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2443 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 20:41:29 +00:00
ebanks
6df40876a3
Un-reverted Matt's previous changes and fixed integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2441 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 02:47:00 +00:00
hanna
2bd0b1bbf7
After further review, it's unclear that my patch in RecalDataManager was the right choice. Reverting.
...
Also updating other IntervalCleanerIntegrationTest failures that were masked by my first patch.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2440 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 00:32:33 +00:00
hanna
98c268483e
Fixed issues with the integration tests:
...
1) sam-jdk apparently no longer supports custom tags with type int[] values.
2) BAM output for indel cleaner integration test changed in a way that's so subtle it can't be seen after converting the output to .sam.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2439 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 23:12:22 +00:00
aaron
b134e0052f
added changes to the code to allow different types of interval merging,
...
1: all overlapping and abutting intervals merged (ALL),
2: just overlapping, not abutting intervals (OVERLAPPING_ONLY),
3: no merging (NONE). This option is not currently allowed, it will throw an exception. Once we're more certain that unmerged lists are going to work in all cases in the GATK, we'll enable that.
The command line option is --interval_merging or -im
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2437 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 21:59:14 +00:00
ebanks
770093a40e
Oops - forgot to check this one in.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2433 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 19:53:28 +00:00
ebanks
dc96879861
2 separate changes which both affect lots of UG integration md5s, so I'm committing them together:
...
1. allele balance annotation is now weighted by genotype quality (so we don't get misled by borderline het calls)
2. Updates to the Unified Genotyper for parallelization:
a. verbose writing now works again; arg was moved from UAC to UG
b. UG checks for command that don't work with parallelization
c. some cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2432 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 19:03:56 +00:00
ebanks
872a9d1c7b
I'm making this change now (as opposed to waiting until Monday) to honor Tim's request.
...
The cycle covariate is now first/second of pair aware. I'm taking it on faith from both Chris Hartl (waiting on slides from him) and Tim that this is the right thing to do. We'll have Ryan confirm it all next week.
The only change is that if a read is the second of a pair, we multiple the cycle by -1 (a simple way of separating its index from that of its mate).
Of course, this broke all integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2431 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 16:26:43 +00:00
ebanks
cf303810d3
VCF reader now creates the correct type of header line for each header type
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2423 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 20:39:06 +00:00
ebanks
87e5a41964
Fixed a bug that accounted for a bunch of my remaining mis-cleaned indels.
...
Also, slightly optimized the cleaner by using readBases (instead of readString) and caching cigar element lengths.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2419 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 05:46:16 +00:00
hanna
9e53c06328
First revision of command-line argument support for GenotypeWriter. Also, fixed the damn build.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2416 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-20 19:19:23 +00:00
aaron
7e0f69dab5
Changed the GLF record to store it's contig name and position in each record instead of in the Reader. Integration tests all stay the same.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2410 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 22:54:56 +00:00
hanna
80b3eb85fa
Fixed curiously epic failure in read-backed pileup: size() mismatched the numReads-numDeletions at that locus in the case where includeReadsWithDeletionsAtLoci == false, causing failures including bad output from pileup walker. Also fixed up ValidatingPileup to run with the new ReadBackedPileup instead of just compiling successfully.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2409 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 22:52:44 +00:00
rpoplin
fdf542c214
The CycleCovariate for 454 data is now the TACG flow cycle. That is, each flow grabs all the T's, A's, C's, and G's in order in a single cycle. This is changed from incrementing the cycle whenever there is a discontinuous nucleotide along the direction of the read.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2408 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 22:39:51 +00:00
ebanks
4ea31fd949
Pushed header initialization out of the GenotypeWriter constructors and into a writeHeader method, in preparation for parallelization.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2406 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 19:16:41 +00:00
ebanks
1cde4161b7
Fixed another test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2399 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 05:05:03 +00:00
ebanks
94f5edb68a
1. Fixed VCFGenotypeRecord bug (it needs to emit fields in the order specified by the GenotypeFormatString)
...
2. isNoCall() added to Genotype interface so that we can distinguish between ref and no calls (all we had before was isVariant())
3. Added Hardy-Weinberg annotation; still experimental - not working yet so don't use it.
4. Move 'output type' argument out of the UnifiedArgumentCollection and into the UnifiedGenotyper, in preparation for parallelization.
5. Improved some of the UG integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2398 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 04:14:14 +00:00
rpoplin
6fbf77be95
Updating the two solid_recal_mode options to also change the previous base since solid aligner prefers single color mismatch alignments over true SNP alignments. COUNT_AS_MISMATCH mode has been removed completely. The default mode is now SET_Q_ZERO.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2394 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 20:07:26 +00:00
hanna
07f1859290
Added integration test for running the recalibrator with no index.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2393 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 19:10:53 +00:00
ebanks
c75ec67f84
When called as a standalone, VariantAnnotator now emits samples in sorted (as opposed to random) order in VCFs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2392 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 19:01:08 +00:00
hanna
b863fffdf6
Fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2390 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 17:55:00 +00:00
asivache
e6cc7dab26
fixing md5 sum; new version of IndelIntervalWalker does the right thing...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2388 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 01:04:13 +00:00
ebanks
b626fc0684
Joint Estimate is now the default calculation model.
...
Reworked all of the integration tests so that they're now more comprehensive, cover more of what we wan to test, and don't take forever to run.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2376 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 19:41:02 +00:00
ebanks
bb312814a2
UG is now officially in the business of making good SNP calls (as opposed to being hyper-aggressive in its calls and expecting the end-user to filter).
...
Bad/suspicious bases/reads (high mismatch rate, low MQ, low BQ, bad mates) are now filtered out by default (and not used for the annotations either), although this can all be turned off.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2373 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 17:28:09 +00:00
ebanks
874552ff75
Pull the genotype (and genotype quality) calculation out of the VCF code and into the Genotyper.
...
[Also, enable Mark's new UG arguments]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2355 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 04:29:28 +00:00
chartl
1389ac6bdf
Hurrr -- this uses power as part of its output. Changes to the power calculation broke the md5s RIGHT AFTER I HAD FIXED THEM arghflrg.
...
Will fix again.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2351 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 22:42:50 +00:00
chartl
b42fc905e8
Added - new tests (Hapmap was re-added)
...
Modified - Hapmap now takes a -q command to filter out variants by quality
Modified - MathUtils - cumBinomialProbLog now uses BigDecimal to handle some numerical imprecisions
Modified - PowerBelowFrequency - returns 0.0 if called with a negative number (can't be done from inside the walker itself, but since it's called elsewhere one can't be too careful)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2350 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 21:57:20 +00:00
rpoplin
8e44bfd2ef
CycleCovariate and PrimerRoundCovariate now correctly handle negative strand 454 and SOLID reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2349 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 21:52:30 +00:00
ebanks
97618663ef
Refactored and generalized the VCF header info code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2346 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 21:02:45 +00:00
ebanks
bd2a46ab4c
I want to move over to hpprojects tonight, so I'm checking in various changes all in one go:
...
1. Initial code for annotating calls with the base mismatch rate within a reference window (still needs analysis).
2. Move error checking code from rodVCF to VCFRecord.
3. More improvements to SNP Genotype callset concordance.
4. Fixed some comments in Variation/Genotype
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2341 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 02:52:18 +00:00
aaron
09811b9f34
Now that we always output the VCF header, make sure that we correctly handle the situation where there are no records in the file. Added unit tests as well.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2333 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 19:51:05 +00:00
ebanks
2ea7632b76
The SNP genotype concordance module is now more comprehensive.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2330 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 18:34:33 +00:00
ebanks
2de7e1a178
Move VariantAnnotator over to use a StratifiedAlignmentContext split by sample.
...
The only major difference is that we are now able to get accurate allele balance ratios.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2321 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 05:28:28 +00:00
ebanks
e6f541fdca
Forgot to update integration test last night
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2308 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 12:57:10 +00:00
ebanks
ee691b8899
Added a whole bunch of unit tests for VCF reading.
...
We could still use more, but this is a good start.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2303 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 03:31:23 +00:00
ebanks
f7c44ad019
- Read in arguments for the header based on reflection
...
- Hook up Variation and Genotype in SSG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2300 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 21:35:33 +00:00
ebanks
40c2d7a4bc
Fix all-bases-mode and genotype-mode in the UG and add integration tests for them.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2295 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 17:41:30 +00:00
ebanks
4e54b91ce4
UG now outputs the FORMAT header fields when there's genotype data.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2294 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 16:31:07 +00:00
ebanks
fb900b12e1
VariantFiltration now details the filters it has used in the header of the VCF it produces.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2292 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 15:36:15 +00:00