asivache
6cf413e630
Bug: ExpandedSAMRecord did not treat hard-clipped bases ('H') correctly. Fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2680 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 19:23:44 +00:00
chartl
4990139b60
A collection of python objects that are useful for VCF validation. Use 'em or don't.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2679 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 18:44:10 +00:00
ebanks
dc170caafc
Now, if a dbsnp rod is passed to either the UnifiedGenotyper or VariantAnnotator, a DB=0/1 annotation is added (in addition to filling in the ID field); this is in line with 1KG project calls. If no dbsnp rod is used, the annotation is not added (as opposed to setting every entry to DB=0).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2678 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 17:27:12 +00:00
rpoplin
5d2f8aaa54
Updating recalibrator version number after the several emergency changes last week.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2677 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-25 14:35:47 +00:00
jmaguire
588417e17d
Don't reference that optimiation library I'm not using anyway.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2676 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-24 20:30:50 +00:00
jmaguire
d3e3c1c2e0
don't require that optmization lib that I'm not using yet... (doh)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2675 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-24 20:28:21 +00:00
jmaguire
1d6d2b26f7
tools for optimizing calls.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2674 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-24 20:16:55 +00:00
jmaguire
877957761f
lots of new stuff, some generally useful, some one-off.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2673 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-24 19:50:48 +00:00
ebanks
78890c0bee
First version of walker that combines the functionality of IndelIntervalWalker, MismatchIntervalWalker, SNPClusterWalker, and IntervalMergerWalker - plus it allows the user to input rods containing known indels (e.g. dbSNP or 1KG calls) for automatic cleaning. Basically, all pre-processing steps for cleaning are now done in a single pass.
...
More testing needed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2672 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-24 05:32:38 +00:00
chartl
d6b9b788a8
Renamed -- PlinkRodWithGenomeLoc --> PlinkRod
...
Since binary files do not need encoded locus information in the SNP names there's no need to suggest that it is so in the name of the rod
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2671 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-23 18:19:28 +00:00
chartl
ac983e7a0b
Ran the rod on a binary plink file with indels and it just worked. Love it when that happens! Unit test to ensure this behaviour is maintained.
...
****** PLINK ROD IS NOW READY TO GO ********
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2670 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-23 18:13:05 +00:00
chartl
ae22d35212
PlinkRod now correctly parses binary files without indels; unit test added for this behavior.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2669 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-23 17:34:06 +00:00
chartl
94dc09c865
PlinkRod now successfully instantiates on the binary ped file trio (.bim, .bam, .fam) for non-indel files.
...
Upcoming: Test that the instantiation is correct, do it for indel-containing files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2668 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-23 16:13:24 +00:00
chartl
01db93299c
PlinkRodWithGenomeLoc now properly handels indels.
...
There is now a DELETION_REFERENCE allele type to allow for the storage of multi-base references rather than point-mutation references.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2667 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-23 07:34:52 +00:00
chartl
42fb85e7f3
PlinkRodWithGenomeLoc now properly parses text plink files. Unit test added to test this functionality. Indels and binary files to come.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2666 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-23 06:19:26 +00:00
hanna
648a36d08e
Temporary solution: add the commons logging implementation to the VariantFiltration package. Downstream solution is described in GSA-262.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2665 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-23 01:46:18 +00:00
depristo
c871a0f221
UG map() now returns a VariantCallContext object. Also has a field for confidentlyCalledBases. UG reduce() emits statistics on the confident called % of bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2664 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-22 23:06:43 +00:00
chartl
fbf82526cb
Minor renamign changes.
...
PlinkRodWithGenomeLoc now supports .bed file parsing (and doesn't require |c#_p# conventions for SNPs -- still requires _g[I/D] for indels)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2663 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-22 23:06:32 +00:00
rpoplin
fd223e955c
Reverting the previous solid change. We now refuse to recalibrate if the solid read doesn't contain proper color space information. The exception message has been updated to say this. Also, Tile has been downgraded to an ExperimentalCovariate due to performance issues.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2662 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-22 20:55:28 +00:00
rpoplin
7732f98e56
Fix for Solid reads that have '.' in their color space field. The recalibrator will just set them to be illumina reads and won't apply color space correction.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2661 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-22 20:09:16 +00:00
aaron
2ea768d902
ant clean is your friend....fixed test code dependent on an interface change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2660 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-22 20:07:46 +00:00
rpoplin
a11503819a
AnalyzeAnnotations now breaks out its TiTv plots into novel SNPs, dbSNP sites, and combined.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2659 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-22 19:00:23 +00:00
hanna
e00cb688ac
Cleanup of GATK-GSA-Pipeline to support the new naming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2658 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 22:17:35 +00:00
aaron
cc3b818268
cleanup of the pile-up limit exceeded warning, and a little code cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2657 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 22:17:24 +00:00
hanna
de21943acd
A few more bug fixes based on extended testing. Sorry, Eric.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2656 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 22:12:16 +00:00
ebanks
c1e09efb23
- Fixed output for beagle header
...
- Better description for QualByDepth annotation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2655 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 21:25:56 +00:00
rpoplin
d9df72e1b5
AnalyzeAnnotations now bins variants per each annotation and outputs plots of TiTv ratio as a function of the annotation's value.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2654 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 21:15:11 +00:00
hanna
4fc926232c
More cleanup. Make sure resources are unioned among all the specified modules.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2653 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 19:38:46 +00:00
hanna
3e54e131e0
Cleanup and formatting overhaul.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2652 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 19:22:56 +00:00
hanna
ee421c106c
Autogenerate .tar.bz2 with embedded version number. Misc formatting changes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2651 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 18:32:46 +00:00
chartl
f51cffe220
Alteration of PlinkToVCF to be much more flexible about parsing .ped file headers, which can have one of a number of different standard fields, and be in different orders.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2650 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 18:02:28 +00:00
chartl
5b2a1e483e
Renamed SequenomToVCF as PlinkToVCF. Wiki will be changed accordingly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2649 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 17:35:20 +00:00
asivache
74779a9a78
First version of the tool that tries determining indel error rate (basically, counts indels that look like sequencing/alignment errors - such as a single observation at deeply covered locus, and reports the rate of their occurence)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2648 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 15:28:20 +00:00
hanna
d25a2fe120
Better handling of enums by the command-line argument system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2647 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 21:36:46 +00:00
ebanks
9c7b281b4f
Set default value for max_coverage to be 100K (since 10K is too small).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2646 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 20:15:25 +00:00
hanna
1e9fe2a334
Clean up error output when enums have missing arguments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2645 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 19:48:26 +00:00
aaron
8d1d37302c
a quick change to GLF to keep as much precision in our likelihoods as long as possible, before we put it into byte space. Sanger was doing a diff at low coverage and noticed our calls didn't contain as much precision as theirs. Updated the MD5 for unified genotyper output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2644 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 19:36:49 +00:00
hanna
908d399670
Bug fix for help text / version number - help text retriever was crashing in the debugger if help text hadn't been built.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2643 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 19:18:19 +00:00
chartl
ab289872e4
Changes:
...
- Annotations return null when given pileups with no second-base information
- SequenomRodWithGenomeLoc -- beter handling of indels
Eric; I made two small changes to the new Genotype interface that we should talk about (they basically have to do with allele/genotype representation):
Allele - added a new UNKNOWN_POINT_MUTATION to AlleleType. If I see a sequenom genotype AG; one's got to be ref, one's got to be SNP, but until I have
an actual reference base in hand, I don't know which is which. That's what this entry is for.
Genotype - added an enum class StandardAttributes for dealing with things like deletion/inversion length. This is probably not the way we want to
represent indels, so we should talk about this. Plus now that there's a direct link between my ROD and the genotype; when we do decide
how to deal with indels, we'll be forced to alter the SequenomRodWithGenomeLoc accordingly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2642 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 16:45:17 +00:00
depristo
cf46e3c85f
Valuable series of commands
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2641 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 13:58:59 +00:00
aaron
a1b4cc4baf
changes to intelligently log overflowing locus pile-ups.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2640 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 08:09:48 +00:00
ebanks
4ac9eb7cb2
- Smarter strand bias calculation
...
- Better debug/verbose printing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2639 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 03:01:26 +00:00
hanna
2261f57e5c
Rename modules to reflect the fact that they're really packages in their own right.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2638 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 01:13:17 +00:00
hanna
2f3fbc145d
A rethink of some of the modules from last night -- make the modules stand alone.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2637 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 00:59:21 +00:00
depristo
ff66023d83
Trivial change to support filter field in VCF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2636 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 22:56:22 +00:00
asivache
4625261d79
Bug fix: alignments ending with 'I' were not counted into the overall coverage which resulted in inaccurate stats, and in rare occasions outright messed up ones.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2635 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 22:12:16 +00:00
hanna
96a053c769
Port VariantEval and FindContaminatingReadGroups to modules.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2634 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 22:10:33 +00:00
hanna
8dafd26100
Print out the current version number in the application header.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2633 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 21:58:36 +00:00
depristo
9e0ae993c7
-B 1kg_ceu,VFC,CEU.vcf -B 1kg_yri,VCF,YRI.vcf system supported to allow 1KG % (like dbSNP%)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2632 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 21:33:13 +00:00
kshakir
e936cbff1b
Removed experimental recalibration covariates.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2631 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 21:29:43 +00:00