Commit Graph

3518 Commits (804facb0cc4dc1a611f0bd3dbfe2b3733b1fee6a)

Author SHA1 Message Date
kiran 804facb0cc Removing these utilities as part of a hostage negotation with Matt. Can I have my journal club paper now?!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3539 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-11 21:41:29 +00:00
weisburd c0370f4d0a Added both inclusive and exclusive filters
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3538 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-11 18:40:41 +00:00
asivache e6d8faf293 making 'parseLocation' public static - as simple as the logic is, it's better kept in one place and I need it!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3537 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-11 18:19:59 +00:00
ebanks 8c28be5933 Fixing a VCF bug for Sendu: we weren't emitting flags (booleans) correctly in VCF3.3 (rev'ed tribble for this).
Updated dbsnp/hapmap membership info fields to be flags now instead of ints.
While I was there, I added the change in the Annotator for Jan to force reads to be from a specific sample.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3536 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-11 16:42:06 +00:00
aaron dde93e743f always output a brief test summary to the screen, and xml to disk
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3535 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-11 16:24:50 +00:00
ebanks 22620ba95c Adding "abi_solid" to the list of known platforms.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3534 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-11 13:37:19 +00:00
ebanks 63ad71cca6 Fix busted code. Note for all:
String.valueOf(byte[]) doesn't work.  You must use new String(byte[]).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3533 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-11 05:01:48 +00:00
weisburd 338bb9adf4 CommandLineProgram for measuring java I/O speeds for large plain-text or gzipped files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3532 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-10 21:34:37 +00:00
weisburd d1a4c4f0d3 Added -w filter option allowing user to specify chromosomes to be skipped.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3531 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-10 20:58:25 +00:00
weisburd 06fc5eecf8 Implemented TreeReducible - if num threads > 1, the output will be accumulated in memory and written to a vcf file at the end - in onTraveralDone(..). If num threads == 1, things will work as before - where vcf records are written to disk as soon as they are computed with map(..).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3530 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-10 20:57:23 +00:00
weisburd 3b375cb237 Sped up parseGenomeLoc(..) by replacing regexp with String.indexOf(..) - attempt 2
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3529 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-10 20:54:36 +00:00
aaron e27951ab39 re-updating the VCF code to handle spaces in sample names
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3528 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-10 20:18:34 +00:00
bthomas 99b684ea89 Adding new support for reference data. ReferenceDataSource is a new class that manages reference data, and allows IndexedFastaSequenceFile to be a simple reader. This checkin also includes FastaSequenceIndexBuilder, which reads a fasta file and creates an index, like samtools faidx. Right now this is not enabled, because we are still working out thread safety. So the only new UI change is that GATK can be run without a fai file. Soon, we will enable 1) GATK to be run without a dict file too, and 2) both dict and fai files will be saved on disk for future program executions. For more info, see ReferenceDataSource.java
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3527 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-10 20:10:23 +00:00
hanna f55f32d4ee Bug fix.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3526 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-10 01:53:26 +00:00
ebanks ca4eab1d23 Now annotations that require reads return null if there's no alignment context, so that running without reads adds annotations only for the appropriate fields.
Added an integration test for the read-less case.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3525 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 20:36:46 +00:00
aaron 6941c81bfa reverting revision 3522 to the old code until we fix the tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3524 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 19:25:02 +00:00
hanna dbee21a50f Bugfixes for the case when no read groups / no samples are available.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3523 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 18:47:05 +00:00
weisburd adc4c4e577 Sped up parseGenomeLoc(..) by replacing regexp with String.indexOf(..)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3522 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 18:11:43 +00:00
chartl 20167fd411 Final changes to MVC -- associates variants with regions of homozygosity in child and parents, corrects for genotype errors, and prints out a separate file with informationf or each region of homozygosity.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3521 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 18:05:37 +00:00
weisburd fdded73861 Improved error reporting
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3520 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:52:48 +00:00
aaron 4f00e265a8 quick update for a change I implemented for Ryan
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3519 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:23:31 +00:00
aaron ad98512f6c adding changes so that we look at the headers already loaded by the engine for samples and other VCF utils, and not create readers for each file to get them (this caused Tribble to regerenate indices if the index file can't be written to disk).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3518 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:21:12 +00:00
weisburd c1b7bcc786 Fixed handling of mitochondrial genes - added special cases such as ATT being a start codon in mitochondria. Added warning if a gene doesn't start with Met or end in a stop codon
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3517 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:15:47 +00:00
weisburd 4f1181974b Added toString() method
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3516 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:12:57 +00:00
weisburd 6fd2d39a7d Modified run_locally mode to use os.system(..) instead of popen
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3515 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:10:03 +00:00
weisburd a3ccf49f5b Write error to stderr
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3514 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:09:10 +00:00
ebanks 9b2fcc4711 Refactoring of the annotation system:
1. VA is now a ROD walker so it no longer requires reads (needs a little more testing)
2. Annotations can now represent multiple INFO fields (i.e. sets of key/value pairs)
3. The chromosome count annotations have been pulled out of UG and the VCF writer code and into VA where they belong.  Fixed the headers too.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3513 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:05:51 +00:00
hanna 84563b37e5 Partial flattening of the hanger data structure. Hanger data structure is
not currently as flat as it could / should be, but it's already comparable
to the speed of the reference implementation.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3512 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 16:28:49 +00:00
chartl 8f9e3e8ad7 Commit for Kiran; but this is now working, barring little exceptions that I've yet to run across...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3511 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 14:21:19 +00:00
aaron 6febd0291d rev tribble to include some dbsnp clean-up and fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3510 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 03:08:31 +00:00
aaron 6d5556939d updating Tribble with a couple of important Tabix fixes, and updating the variant eval integration tests to run each test with both plain vcf and gzipped tabix (added the tabix version
to the vlidation directory), using the same md5sum.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3509 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 01:47:04 +00:00
weisburd 2b31975cb4 Added more options for coordinate systems - now you can add 1 to either the start coordinates, the end coordinates, or both
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3508 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 22:49:19 +00:00
weisburd 410afcdf2c Added parallelization options - when running locally, multiple processes can be spawned, or a -nt arg can be specified to run each TranscriptToInfo instance multi-threaded
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3507 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 22:48:07 +00:00
weisburd 92c72d3361 Added back lines that update the *big-table-header.txt file before using it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3506 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 22:45:41 +00:00
weisburd 3c24223d02 Script for concatenating 2 AnnotatorInputTables, and writing the result to standard out. Merge-sorts the 2 tables while concatenating them
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3505 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 22:44:16 +00:00
hanna c2858c8988 Minor performance enhancement. Checkpoint commit before major performance
overhaul.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3504 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 21:39:39 +00:00
chartl 5ed2818ffb Forgot to commit code i relied upon
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3503 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 21:01:35 +00:00
chartl 736098b58d A quick commit before running home. This is a re-factored version of the OppositeHomozygoteClassifier which will work with deNovo violations as well. Some code still needs to be migrated from OHC which is wy that walker isn't yet deleted. This'll be up and running tonight.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3502 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 20:47:01 +00:00
delangel de134c226d Removed ability of users to specify annotations to recompute, cleanups.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3501 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 19:17:59 +00:00
ebanks 4d1a6b3d99 quick changes for G
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3500 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 16:33:27 +00:00
delangel 907931c902 a) Update annotations when creating new vcf with Beagle's imputed data. Since genotypes may (will) change based on imputation, several annotations need to be updated. By default, AC, AF, AN and AB will be updated. User can force extra annotaqtions to be updated with -A <annotation> argument.
b) Several cleanups and beautifications.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3499 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 15:12:04 +00:00
chartl 933133ee28 Initial commit of the opposite homozygote classifier. Currently does the following, given a trio vcf:
+ Identifies opposite homozygote sites
 + Identifies the parent from whom it is expected that a null allele was inherited (or whether it was a putative genotype error; e.g. mom=homref, dad=homref, child=homvar)
 + Labels each opposite homozygote with its homozygous region in the child (e.g. region 1, region 2)
 + Labels each opposite homozygote with the size of the homozygous region in which it was found, the number of child homozygotes in the region, and the number of opposite homozygote violations within that region

To come:
 + Classification of sites as likely tri-allelic


Note that this is very experimental



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3498 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 03:56:07 +00:00
hanna 199e4208cd Bug fixes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3497 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 00:30:33 +00:00
hanna 52ab9f2417 Feature parity between LocusIteratorByState, DownsamplingLocusIteratorByState, including pushing mrl /
the LocusOverflowTracker into LocusIteratorByState.  Note that the 'Matt Hanna exception', is still enabled
because I haven't yet validated the performance of the DownsamplingLocusIteratorByState when running
without downsampling.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3496 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-07 22:58:21 +00:00
hanna 5c4d070566 Push Mark's changes in LocusIteratorByState into DownsamplingLocusIteratorByState
in preparation for merging the two into one.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3495 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-07 17:29:30 +00:00
depristo 6eeb1693ca JEXL2 upgrade. Improvements to JEXL processing including dynamically resolving variable -> value bindings instead of up front adding them to a map. Performance improvements and code cleanup throughout.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3494 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-07 00:33:02 +00:00
hanna c1ecf75dd5 Update to the latest rev of the picard sharding patch. Includes updates reflecting
the imminent move of IlluminaUtil into picard public.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3493 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-06 20:33:21 +00:00
delangel c503f01dcf More cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3492 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-06 17:41:38 +00:00
delangel d4c66d6191 a) Small cleanup
b) Fix major issue with Beagle likelihood converter: if likelihood triplets from UG end up being too low, then Beagle input file will be produced with 0.00,0.00,0.00 triplet. If all samples at a marker have this issue, Beagle will effectively produce junk. To fix, likelihoods are renormalized before converting to linear space.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3491 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-06 17:31:59 +00:00
depristo cfa18f6743 Fixing missed update with new Allele in it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3490 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 23:56:34 +00:00