Commit Graph

2128 Commits (3e54e131e0c6484e7d12f428dbd0af9409c9dbc0)

Author SHA1 Message Date
chartl f51cffe220 Alteration of PlinkToVCF to be much more flexible about parsing .ped file headers, which can have one of a number of different standard fields, and be in different orders.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2650 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 18:02:28 +00:00
chartl 5b2a1e483e Renamed SequenomToVCF as PlinkToVCF. Wiki will be changed accordingly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2649 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 17:35:20 +00:00
asivache 74779a9a78 First version of the tool that tries determining indel error rate (basically, counts indels that look like sequencing/alignment errors - such as a single observation at deeply covered locus, and reports the rate of their occurence)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2648 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-21 15:28:20 +00:00
hanna d25a2fe120 Better handling of enums by the command-line argument system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2647 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 21:36:46 +00:00
ebanks 9c7b281b4f Set default value for max_coverage to be 100K (since 10K is too small).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2646 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 20:15:25 +00:00
hanna 1e9fe2a334 Clean up error output when enums have missing arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2645 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 19:48:26 +00:00
aaron 8d1d37302c a quick change to GLF to keep as much precision in our likelihoods as long as possible, before we put it into byte space. Sanger was doing a diff at low coverage and noticed our calls didn't contain as much precision as theirs. Updated the MD5 for unified genotyper output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2644 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 19:36:49 +00:00
hanna 908d399670 Bug fix for help text / version number - help text retriever was crashing in the debugger if help text hadn't been built.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2643 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 19:18:19 +00:00
chartl ab289872e4 Changes:
- Annotations return null when given pileups with no second-base information

- SequenomRodWithGenomeLoc -- beter handling of indels

Eric; I made two small changes to the new Genotype interface that we should talk about (they basically have to do with allele/genotype representation):

Allele - added a new UNKNOWN_POINT_MUTATION to AlleleType. If I see a sequenom genotype AG; one's got to be ref, one's got to be SNP, but until I have
         an actual reference base in hand, I don't know which is which. That's what this entry is for.

Genotype - added an enum class StandardAttributes for dealing with things like deletion/inversion length. This is probably not the way we want to
         represent indels, so we should talk about this. Plus now that there's a direct link between my ROD and the genotype; when we do decide
         how to deal with indels, we'll be forced to alter the SequenomRodWithGenomeLoc accordingly.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2642 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 16:45:17 +00:00
aaron a1b4cc4baf changes to intelligently log overflowing locus pile-ups.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2640 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 08:09:48 +00:00
ebanks 4ac9eb7cb2 - Smarter strand bias calculation
- Better debug/verbose printing



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2639 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-20 03:01:26 +00:00
depristo ff66023d83 Trivial change to support filter field in VCF
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2636 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 22:56:22 +00:00
asivache 4625261d79 Bug fix: alignments ending with 'I' were not counted into the overall coverage which resulted in inaccurate stats, and in rare occasions outright messed up ones.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2635 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 22:12:16 +00:00
hanna 8dafd26100 Print out the current version number in the application header.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2633 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 21:58:36 +00:00
depristo 9e0ae993c7 -B 1kg_ceu,VFC,CEU.vcf -B 1kg_yri,VCF,YRI.vcf system supported to allow 1KG % (like dbSNP%)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2632 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 21:33:13 +00:00
rpoplin c98df0a862 Updated solid_recal_modes to work with bfast aligned data. Added an integration test that uses the BFAST file provided by TGen.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2630 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 21:18:02 +00:00
chartl 53352e1bb4 First pass at a sequenom ROD. Nothing uses it; currently undergoing testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2629 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 17:09:36 +00:00
hanna 1488578617 Working with Aaron to get svnversion running within the build system. This change will break the build.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2628 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 16:55:42 +00:00
rpoplin bca436578f Added the -maxQ argument to the list of arguments in the PG tag
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2627 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 15:55:23 +00:00
rpoplin d61cafd19f Make the formatting of the list of args in the PG tag consistent.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2626 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 15:31:37 +00:00
rpoplin a12465b6d5 The recalFile argument is no longer added into the PG tag of a bam produced by TableRecalibration. Based on a request from the Sanger.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2625 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-19 15:25:57 +00:00
rpoplin ba19afd529 Draft version of AnalyzeAnnotations which creates plots of cumulative TiTv ratio versus filter value per each annotation in the input VCF rod. Minor cleanup of recalibration walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2623 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-18 20:47:10 +00:00
kiran ff6877a15e Added a forgotten column label
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2622 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-18 01:00:52 +00:00
kiran dd6d5aadf9 Computes empirical confusion matrices, optionally with up to five bases of preceding context
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2621 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-18 00:55:12 +00:00
ebanks 12453fa163 Misc cleanup of UG args
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2620 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-17 04:38:52 +00:00
ebanks b8cdf64c20 Better descriptions for max reads/downsampling args
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2618 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-17 02:30:27 +00:00
depristo 64225b28fd Convenience methods for getting the VCFReader and VCFRecord
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2614 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 20:22:31 +00:00
depristo d0af7f6c7b Now analyzes filtered SNP like all, novel subsets; support for selecting a single sample to analyze from a multi-sample VCF, support for trivial selection of records with INFO field key/value pair.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2613 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 20:22:04 +00:00
depristo 8ae8e120f8 New annotateUnion operation -- provides clearer annotations on where a call came from when unioning two VCF call sets
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2612 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 20:20:37 +00:00
depristo 41392f8ff5 functions for setting gentoype records and alternate bases; function for getting all rods implementing VCF
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2611 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 20:19:43 +00:00
hanna ac4756db20 Add the svn version on the fly to the version number properties.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2607 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-16 00:28:01 +00:00
hanna 420cef4094 Added version numbers to the help doclet extractor. Since the help system is behaving
more like a resource bundle at this point, changed it over to use the Java ResourceBundle
support classes.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2606 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 23:31:29 +00:00
rpoplin 4de7d6a59b Initial checkin of skeleton code for AnalyzeAnnotations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2605 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:52:34 +00:00
hanna 930082314a Put a major.minor version into the GATK Javadoc for reading. Also,
update some straggler packages to the new package-info.java format introduced in 1.5.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2604 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:48:30 +00:00
mmelgar 3063224446 SecondaryBaseTransitionTableWalker now breaks by genotype and read group, is javadoc annotated, and is compatible with ReadBackedPileup's methods.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2603 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:43:39 +00:00
asivache 7a991421f7 -erw argument, begone! Rod traversals are now enabled. current tests pass, more tests for RODWalkers are welcome ;)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2601 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:11:14 +00:00
asivache c8c5c176cd -erw argument, begone! Rod traversals are now enabled. current tests pass, more tests for RODWalkers are welcome ;)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2600 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:07:49 +00:00
asivache a12933a26d Bug fixed: now the length of an insertion is determined correctly. Thought I committed this...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2599 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 20:58:48 +00:00
asivache 404b95183f This is a LocusWalker, not a RodWalker (thanks Mark!!). RodWalkers currently are not capable of attaching alignment contexts (reads) to the ROD-annotated loci they traverse over...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2596 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 20:33:41 +00:00
rpoplin 7078219b89 Updating outdated comments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2595 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 19:17:52 +00:00
rpoplin ba2acda406 Clarifying the comment regarding differentiating between first and second of pair in CycleCovariate.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2594 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 18:36:14 +00:00
ebanks b911b7df82 Fixing the AC annotation to be in line with the VCF spec
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2593 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 18:28:52 +00:00
rpoplin f2e539c52f As per discussions with Tim we are reverting the previous change regarding PairedReadOrderCovariate. The CycleCovariate now differentiates between first and second of pair by multiplying the cycle by -1. PairedReadOrderCovariate has been removed completely.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2592 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 18:18:59 +00:00
asivache eae1b73945 Fixed a bug in left-adjusting the indels introduced in previous commit :-/
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2591 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 17:41:23 +00:00
rpoplin df998041a8 Minor change to solid warning message. Added note for a future solid recalibration integration test when we get the required data file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2590 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 16:31:25 +00:00
rpoplin 70df30fc1b Added method to AlignmentUtils which takes a read's cigar and the refBases char array given to a ReadWalker and returns the aligned reference char array. Bug fix in solid_recal_modes to use this aligned reference array. Recalibrator version number is no longer separate for each of the two walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2589 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 15:36:59 +00:00
ebanks 2a116bb5d6 Made the VCF validator a simple rod walker instead of having it be in a separate package.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2588 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 06:39:06 +00:00
hanna b19bb19f3d First successful test of new sharding system prototype. Can traverse over reads from a single
BAM file.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2587 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 03:35:55 +00:00
aaron db9570ae29 Looks bigger than it is:
* Moved GATKArgumentCollection into gatk.arguments folder to clean up the main folder, also added some associated argument classes (most of the changes).
* Added code the argument parsing system for default enums, we needed this so we could preserve the current unsafe flag, and at the same time allow finer grained control of unsafe operations.  You can now specify:

"-U" (for all unsafe operations), "-U ALLOW_UNINDEXED_BAM" (only allow unindexed BAMs), "-U NO_READ_ORDER_VERIFICATION", etc.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2586 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 00:14:35 +00:00
kiran 04fdbbfa65 This is the beginning of a new version of VariantEval that can cut VCF files up in a variety of ways with JEXL expressions, select one sample out of a multi-sample VCF, and can load analysis modules dynamically.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2584 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-14 17:45:58 +00:00