Commit Graph

2265 Commits (fddca032bbbbf4759eb1403c8e7e5a9ae49fa220)

Author SHA1 Message Date
asivache 74779a9a78 First version of the tool that tries determining indel error rate (basically, counts indels that look like sequencing/alignment errors - such as a single observation at deeply covered locus, and reports the rate of their occurence)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2648 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 15:28:20 +00:00
hanna d25a2fe120 Better handling of enums by the command-line argument system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2647 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 21:36:46 +00:00
ebanks 9c7b281b4f Set default value for max_coverage to be 100K (since 10K is too small).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2646 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 20:15:25 +00:00
hanna 1e9fe2a334 Clean up error output when enums have missing arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2645 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 19:48:26 +00:00
aaron 8d1d37302c a quick change to GLF to keep as much precision in our likelihoods as long as possible, before we put it into byte space. Sanger was doing a diff at low coverage and noticed our calls didn't contain as much precision as theirs. Updated the MD5 for unified genotyper output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2644 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 19:36:49 +00:00
hanna 908d399670 Bug fix for help text / version number - help text retriever was crashing in the debugger if help text hadn't been built.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2643 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 19:18:19 +00:00
chartl ab289872e4 Changes:
- Annotations return null when given pileups with no second-base information

- SequenomRodWithGenomeLoc -- beter handling of indels

Eric; I made two small changes to the new Genotype interface that we should talk about (they basically have to do with allele/genotype representation):

Allele - added a new UNKNOWN_POINT_MUTATION to AlleleType. If I see a sequenom genotype AG; one's got to be ref, one's got to be SNP, but until I have
         an actual reference base in hand, I don't know which is which. That's what this entry is for.

Genotype - added an enum class StandardAttributes for dealing with things like deletion/inversion length. This is probably not the way we want to
         represent indels, so we should talk about this. Plus now that there's a direct link between my ROD and the genotype; when we do decide
         how to deal with indels, we'll be forced to alter the SequenomRodWithGenomeLoc accordingly.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2642 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 16:45:17 +00:00
aaron a1b4cc4baf changes to intelligently log overflowing locus pile-ups.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2640 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 08:09:48 +00:00
ebanks 4ac9eb7cb2 - Smarter strand bias calculation
- Better debug/verbose printing



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2639 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 03:01:26 +00:00
depristo ff66023d83 Trivial change to support filter field in VCF
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2636 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 22:56:22 +00:00
asivache 4625261d79 Bug fix: alignments ending with 'I' were not counted into the overall coverage which resulted in inaccurate stats, and in rare occasions outright messed up ones.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2635 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 22:12:16 +00:00
hanna 8dafd26100 Print out the current version number in the application header.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2633 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 21:58:36 +00:00
depristo 9e0ae993c7 -B 1kg_ceu,VFC,CEU.vcf -B 1kg_yri,VCF,YRI.vcf system supported to allow 1KG % (like dbSNP%)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2632 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 21:33:13 +00:00
rpoplin c98df0a862 Updated solid_recal_modes to work with bfast aligned data. Added an integration test that uses the BFAST file provided by TGen.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2630 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 21:18:02 +00:00
chartl 53352e1bb4 First pass at a sequenom ROD. Nothing uses it; currently undergoing testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2629 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 17:09:36 +00:00
hanna 1488578617 Working with Aaron to get svnversion running within the build system. This change will break the build.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2628 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 16:55:42 +00:00
rpoplin bca436578f Added the -maxQ argument to the list of arguments in the PG tag
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2627 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 15:55:23 +00:00
rpoplin d61cafd19f Make the formatting of the list of args in the PG tag consistent.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2626 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 15:31:37 +00:00
rpoplin a12465b6d5 The recalFile argument is no longer added into the PG tag of a bam produced by TableRecalibration. Based on a request from the Sanger.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2625 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 15:25:57 +00:00
rpoplin ba19afd529 Draft version of AnalyzeAnnotations which creates plots of cumulative TiTv ratio versus filter value per each annotation in the input VCF rod. Minor cleanup of recalibration walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2623 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-18 20:47:10 +00:00
kiran ff6877a15e Added a forgotten column label
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2622 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-18 01:00:52 +00:00
kiran dd6d5aadf9 Computes empirical confusion matrices, optionally with up to five bases of preceding context
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2621 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-18 00:55:12 +00:00
ebanks 12453fa163 Misc cleanup of UG args
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2620 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-17 04:38:52 +00:00
ebanks b8cdf64c20 Better descriptions for max reads/downsampling args
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2618 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-17 02:30:27 +00:00
depristo d8e74c5795 Update to MD5s for old tests and added extensive VCF testing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2615 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 20:22:58 +00:00
depristo 64225b28fd Convenience methods for getting the VCFReader and VCFRecord
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2614 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 20:22:31 +00:00
depristo d0af7f6c7b Now analyzes filtered SNP like all, novel subsets; support for selecting a single sample to analyze from a multi-sample VCF, support for trivial selection of records with INFO field key/value pair.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2613 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 20:22:04 +00:00
depristo 8ae8e120f8 New annotateUnion operation -- provides clearer annotations on where a call came from when unioning two VCF call sets
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2612 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 20:20:37 +00:00
depristo 41392f8ff5 functions for setting gentoype records and alternate bases; function for getting all rods implementing VCF
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2611 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 20:19:43 +00:00
hanna ac4756db20 Add the svn version on the fly to the version number properties.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2607 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 00:28:01 +00:00
hanna 420cef4094 Added version numbers to the help doclet extractor. Since the help system is behaving
more like a resource bundle at this point, changed it over to use the Java ResourceBundle
support classes.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2606 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 23:31:29 +00:00
rpoplin 4de7d6a59b Initial checkin of skeleton code for AnalyzeAnnotations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2605 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:52:34 +00:00
hanna 930082314a Put a major.minor version into the GATK Javadoc for reading. Also,
update some straggler packages to the new package-info.java format introduced in 1.5.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2604 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:48:30 +00:00
mmelgar 3063224446 SecondaryBaseTransitionTableWalker now breaks by genotype and read group, is javadoc annotated, and is compatible with ReadBackedPileup's methods.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2603 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:43:39 +00:00
asivache 7a991421f7 -erw argument, begone! Rod traversals are now enabled. current tests pass, more tests for RODWalkers are welcome ;)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2601 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:11:14 +00:00
asivache c8c5c176cd -erw argument, begone! Rod traversals are now enabled. current tests pass, more tests for RODWalkers are welcome ;)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2600 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:07:49 +00:00
asivache a12933a26d Bug fixed: now the length of an insertion is determined correctly. Thought I committed this...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2599 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 20:58:48 +00:00
asivache 404b95183f This is a LocusWalker, not a RodWalker (thanks Mark!!). RodWalkers currently are not capable of attaching alignment contexts (reads) to the ROD-annotated loci they traverse over...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2596 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 20:33:41 +00:00
rpoplin 7078219b89 Updating outdated comments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2595 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 19:17:52 +00:00
rpoplin ba2acda406 Clarifying the comment regarding differentiating between first and second of pair in CycleCovariate.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2594 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 18:36:14 +00:00
ebanks b911b7df82 Fixing the AC annotation to be in line with the VCF spec
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2593 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 18:28:52 +00:00
rpoplin f2e539c52f As per discussions with Tim we are reverting the previous change regarding PairedReadOrderCovariate. The CycleCovariate now differentiates between first and second of pair by multiplying the cycle by -1. PairedReadOrderCovariate has been removed completely.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2592 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 18:18:59 +00:00
asivache eae1b73945 Fixed a bug in left-adjusting the indels introduced in previous commit :-/
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2591 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 17:41:23 +00:00
rpoplin df998041a8 Minor change to solid warning message. Added note for a future solid recalibration integration test when we get the required data file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2590 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 16:31:25 +00:00
rpoplin 70df30fc1b Added method to AlignmentUtils which takes a read's cigar and the refBases char array given to a ReadWalker and returns the aligned reference char array. Bug fix in solid_recal_modes to use this aligned reference array. Recalibrator version number is no longer separate for each of the two walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2589 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 15:36:59 +00:00
ebanks 2a116bb5d6 Made the VCF validator a simple rod walker instead of having it be in a separate package.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2588 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 06:39:06 +00:00
hanna b19bb19f3d First successful test of new sharding system prototype. Can traverse over reads from a single
BAM file.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2587 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 03:35:55 +00:00
aaron db9570ae29 Looks bigger than it is:
* Moved GATKArgumentCollection into gatk.arguments folder to clean up the main folder, also added some associated argument classes (most of the changes).
* Added code the argument parsing system for default enums, we needed this so we could preserve the current unsafe flag, and at the same time allow finer grained control of unsafe operations.  You can now specify:

"-U" (for all unsafe operations), "-U ALLOW_UNINDEXED_BAM" (only allow unindexed BAMs), "-U NO_READ_ORDER_VERIFICATION", etc.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2586 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 00:14:35 +00:00
asivache cff8b705c0 Oh, and the test would not work anymore...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2585 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:47:09 +00:00
kiran 04fdbbfa65 This is the beginning of a new version of VariantEval that can cut VCF files up in a variety of ways with JEXL expressions, select one sample out of a multi-sample VCF, and can load analysis modules dynamically.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2584 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:45:58 +00:00
asivache df63f51253 No changes, just sync-ing; only some commented out debugging prints are added...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2583 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:45:15 +00:00
asivache d85461c463 MergingIterator completely re-done. Now it is not a generic class (sorry guys), but rather it is tailored for merging ROD tracks. This implementation peeks the locations of next ROD annotations in each track, but does not actually read these RODs from underlying streams until the location is reached and it is time to actually return the object. Now underlying ROD track iterators (registered in the resource pool!) are not advanced prematurely past the current position and all the way to the next ROD record wherever it is, so that the sharding system can reuse them.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2582 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:43:36 +00:00
asivache c0891d512f added: peekNextLocation(); it's quite hard (and probably unnecessary, ever) to make seekable iterator a peekable one, but it is quite easy and useful to be able to peek just the next location the iterator will jump to after next call to next()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2581 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:38:19 +00:00
rpoplin 9bf0d7250a Fixing the testOtherOutput UG integration test so it will run.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2580 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 13:40:14 +00:00
ebanks a082b948a3 Support throughout for S and N cigar elements.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2579 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 03:45:42 +00:00
chartl 424d1b57f7 Sequenom to VCF now allows user to specify filters for QC, and they will appear in the filter field of the output VCF
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2577 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 23:22:37 +00:00
rpoplin f96b2b211e My last checkin updating R code broke an unrelated UnifiedGenotyper integration test. Eric says that I should take out the verbose test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2576 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 22:28:10 +00:00
rpoplin 49c44e7b36 PairedReadOrderCovariate is now a standard covariate and because of this CycleCovariate no longer multiplies by negative one for second of pair reads. Added PairedReadOrderCovariate to some of the integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2574 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 20:09:10 +00:00
hanna 05575e2e56 Better bounding for the locus window. Don't make the locus window calculation blow up if the GenomeLoc ends
up being outside the reference.  Force the blowup elsewhere.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2573 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 17:03:54 +00:00
ebanks 8ca5bba738 We emit genotype data in the VCF record if the format string instructs us to (regardless of whether or not genotypes are provided - this was the wrong test).
SequenomToVCF now correctly has no-calls when probes fail.
Re-enabled SequenomToVCF integration test.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2572 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 15:40:27 +00:00
chartl 6d1107a4ed Update to SequenomToVCF
Output changing slightly so integration test disabled temporarily



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2571 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 15:32:05 +00:00
ebanks f99586f91b Added integration test for beagle and verbose output in UG.
Minor cleanup of VCFRecord code.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2570 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 03:55:24 +00:00
hanna 02e23e2d9c Threading support for beagle output files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2569 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 02:42:16 +00:00
aaron 0513690416 two fixes in the new cached DbSNP code:
-isBiallelic would incorrectly say triallelic sites are biallelic.
-getAlternateAlleleList was broken, since the new cached list is immutable, we couldn’t remove list items.

Also added a dbSNP validating walker to the one-offs, for testing the new b37 130 dbSNP rod.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2568 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-13 00:27:34 +00:00
asivache a138bad95a A rare but not-so-subtle bug fixed: a funky alignment (a kind that should not have been generated in the first place) could make the indel left-adjusting method to overshoot read start and build a cigar like -3M6I...
also, few minor fix-ups.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2567 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 21:29:50 +00:00
rpoplin b51f4aae11 Updating the recalibrator to make use of StingSAMFileWriter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2566 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 20:58:27 +00:00
rpoplin c8ad025ad0 cleaning up unused import statements
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2565 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:52:37 +00:00
rpoplin 189829841b The recalibrator now uses all input RODs when looking for known polymorphic sites not just the one named dbsnp. Added an integration test which uses both dbsnp and an input vcf file and skips over the union of the two.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2564 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:50:39 +00:00
aaron 16777e3875 more fixes for the empty interval list problem; you can now run LocusWindow traversals with an empty interval list, but the GATK will give you a warning (unless you're running in unsafe mode).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2563 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:47:43 +00:00
hanna 35a4fcc481 Additional sanity checking: make sure the user can't alter the header / compression level / presorted state of a file to which SAMRecords have already been written.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2562 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:39:41 +00:00
ebanks 03b7d5f5c7 1. Fixed small but embarrassing bug in weighted Allele Balance annotation calculation.
2. Made RankSumTest abstract; added 2 subclasses: BaseQualityRST and MappingQualityRST (the latter based on a suggestion from Mark Daly).  Untested so they're still experimental.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2561 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:33:53 +00:00
hanna 58999a8e9d Enhance the I/O management system to support custom headers and set the presorted flag
from the initialize() method (or at any time before the first SAM record is written).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2560 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 18:21:42 +00:00
aaron 3c5f5177b1 check to see if the parsed interval list is empty, since we now allow interval files that are empty. If so, make sure we default to a non-interval based traversal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2559 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-12 17:52:27 +00:00
ebanks 040fdfee61 Cleaned up the interface to VCFRecord. It's now possible (and easy) to create records and then write them with a VCFWriter.
I've updated HapMap2VCF to use the new interface; Chris agreed to take care of Sequenom2VCF.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2558 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 21:42:12 +00:00
ebanks 42aff1d2c3 Annotator in general should be able to annotate monomorphic or tri-allelic sites.
It's up to the individual annotations to decide whether they want to annotate or not.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2556 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 19:52:18 +00:00
rpoplin 11f91b3c95 Reverting Eric's previous change because it killed the PG tag in the output bam file header. Added a new -compress command line argument to set the compression level of the output bam file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2555 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 19:02:56 +00:00
chartl dfa3c3b875 Added:
SequenomToVCF - Takes a sequenom ped file and converts it to a VCF file with the proper metrics for QC. It's currently a rough draft,
but is working as expected on a test ped file, which is included as an integration test.

Modified:

VCFGenotypeCall -- added a cloneCall() method that returns a clone of the call

Hapmap2VCF -- removed a VCFGenotypeCall object that gets instantiated and modified but never used
(caused me all kinds of confusion when I was basing SequenomToVCF off of it)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2554 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 17:17:21 +00:00
rpoplin 62dd2fa5be Fixing another bug in solid recal regarding negative strand reads. The isInconsistentColorSpace method incorrectly used the inconsistent tag added by parseColorSpace, the inconsistent tag is in the direction of the read like the color space tag, and not in the direction of the reference like everything else. This affects the recalibrated quality scores but the improvment in SNP calling performance is minor when using the default UG settings (min base quality 10).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2553 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-11 14:28:52 +00:00
ebanks 971834ca90 Added a walker to the vcf tools compilation: one that combines vcf records. Both merges and unions are supported (see documentation... when it gets written this week).
Also, moved some code that pulls samples out of rods from VCFUtils into SampleUtils.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2552 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-10 06:45:11 +00:00
ebanks 80af0f2f54 Changed the OUTPUT_BAM_FILE argument from String to SAMFileWriter and removed the call to close().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2551 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-10 03:45:54 +00:00
hanna 7893aaefe9 Updates to chunk iteration. Includes the return of the dreaded *2.java files;
hopefully I can find a way to kill these off before the Picard patch is ready.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2550 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 20:20:56 +00:00
ebanks fcce77c245 Added -beagle option to emit likelihoods file for use with the BEAGLE imputation engine; still experimental.
(Also converted getPileup -> getBasePileup)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2549 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 18:41:04 +00:00
rpoplin 9cbae53ee1 Bug fixes for both SET_Q_ZERO and REMOVE_REF_BIAS solid recal modes regarding proper handling of negative strand reads. These changes yield a minor improvment in HapMap sensitivity.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2548 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 15:19:22 +00:00
ebanks dfcd5ce25b Fixed broken test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2547 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 06:13:01 +00:00
ebanks d5ab002449 Curiously, it seems I never set the default base quality used by the Genotyper to 10. It's done now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2546 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 06:02:01 +00:00
ebanks b468369dfa -UG's call into VariantAnnotator now uses the full alignment context (as opposed to the filtered one)
-MQ0 annotation is now standard again
-Added AC and AN annotations to VCF output



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2545 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-08 05:40:42 +00:00
rpoplin f587ff46af Tile is now a standard covariate. By default the TileCovariate returns -1 if tile can't be derived from the read's name. Added a new command line option -throwTileException which will force TileCovariate to throw an exception if tile can't be derived for a read. Singleton covariates, such as any read group without tile info, must be skipped over in TableRecalibration so that the sequential formulation doesn't apply the same correction more than once. TileCovariate class has been added to the Early Access package.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2544 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 22:51:41 +00:00
asivache d01bde36a4 Make sure that reference view holds enough bases to pass full-length deleted sequence to the walker's map() function in extended event mode (this addresses the problem of a deletion crossing the shard's boundary, so that an attempt to extract deleted bases results in a crash)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2543 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 22:37:22 +00:00
asivache e9bc85c188 Now has methods that allow to 1) check if a location is within the bounds of the reference view; 2) expand reference view (i.e. expand the bounds and reload the reference sequence) in order to accomodate specified location. The second method can be called directly since it performs a check and if the location is already within the bounds, then returns immediately. The costly ref sequence reloading occurs only when the location is not fully contained within the current bounds.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2542 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 22:35:17 +00:00
asivache 7f91b4d824 Bug fix. It would be nice if we could extract ROD annotations for the whole length of an extended event (indel), and we tried... But alas, it does not work with the current ROD system (after extracting length on ref > 1 ROD data for a deletion, rod iterator crashes on the attempt to re-load annotations for next reference base)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2541 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 21:30:55 +00:00
rpoplin 5f58492401 A rogue QualityUtils.MAX_REASONABLE_Q_SCORE managed to get through my previous bug fix. It should instead check the command line -maxQ argument.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2540 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 21:17:39 +00:00
ebanks c7a8dffa89 Check for division by 0 in annotations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2539 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 19:27:15 +00:00
ebanks 9a658e6b18 -Fixed VCF header line bug
-Added useful trim() method for Strings for characters other than whitespace


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2538 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 17:51:41 +00:00
ebanks b643a513bb Minor interface change for VCFGenotypeRecord.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2537 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 16:48:09 +00:00
andrewk 431e9c2c8b Add dbSNP ID to VCF output records
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2536 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 15:30:04 +00:00
depristo 076481f786 Fixes to mergeVCF -- now correctly supports merging of filter fields. Also removed incorrect hasFilteringCodes() function. Updated intergration tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2535 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 14:50:13 +00:00
rpoplin cea544871d Fixed an issue with recalibrating original quality scores above Q40. There is a new option -maxQ which sets the maximum quality score possible for when a RecalDatum tries to compute its quality score from the mismatch rate. The same option was added to AnalyzeCovariates to help with plotting q scores above Q40. Added an integration test which makes use of this new -maxQ option.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2534 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 13:50:30 +00:00
ebanks 6c739e30e0 1. Removing an old version of the Genotype interface which is no longer being used. Needed to do this now so that the naming conflicts would cease.
2. Adding a preliminary version of the new Genotype/Allele interface (putting it into refdata/ as the VariantContext really only applies to rods) with updates to VariantContext.  This is by no means complete - further updates coming tomorrow.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2533 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 05:51:10 +00:00
depristo a9245a58e2 Fix for incorrect exception throwing in VCFRecord. It is reasonable to ask for the non-ref allele freq at all ref sites. Was only passing in tests because isReference was broken
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2532 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 01:18:30 +00:00
depristo 7215526810 Fix to isReference() in VCFRecord. Change to VariantCounter to correctly counter only non-genotype variants, as well as update to VariantEvalWalker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2531 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 00:03:29 +00:00