Commit Graph

3146 Commits (6ffcaa0afe0ee1a49c9e5d704cc361b0c1b03c72)

Author SHA1 Message Date
delangel 907931c902 a) Update annotations when creating new vcf with Beagle's imputed data. Since genotypes may (will) change based on imputation, several annotations need to be updated. By default, AC, AF, AN and AB will be updated. User can force extra annotaqtions to be updated with -A <annotation> argument.
b) Several cleanups and beautifications.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3499 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 15:12:04 +00:00
chartl 933133ee28 Initial commit of the opposite homozygote classifier. Currently does the following, given a trio vcf:
+ Identifies opposite homozygote sites
 + Identifies the parent from whom it is expected that a null allele was inherited (or whether it was a putative genotype error; e.g. mom=homref, dad=homref, child=homvar)
 + Labels each opposite homozygote with its homozygous region in the child (e.g. region 1, region 2)
 + Labels each opposite homozygote with the size of the homozygous region in which it was found, the number of child homozygotes in the region, and the number of opposite homozygote violations within that region

To come:
 + Classification of sites as likely tri-allelic


Note that this is very experimental



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3498 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 03:56:07 +00:00
hanna 199e4208cd Bug fixes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3497 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 00:30:33 +00:00
hanna 52ab9f2417 Feature parity between LocusIteratorByState, DownsamplingLocusIteratorByState, including pushing mrl /
the LocusOverflowTracker into LocusIteratorByState.  Note that the 'Matt Hanna exception', is still enabled
because I haven't yet validated the performance of the DownsamplingLocusIteratorByState when running
without downsampling.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3496 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-07 22:58:21 +00:00
hanna 5c4d070566 Push Mark's changes in LocusIteratorByState into DownsamplingLocusIteratorByState
in preparation for merging the two into one.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3495 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-07 17:29:30 +00:00
depristo 6eeb1693ca JEXL2 upgrade. Improvements to JEXL processing including dynamically resolving variable -> value bindings instead of up front adding them to a map. Performance improvements and code cleanup throughout.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3494 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-07 00:33:02 +00:00
hanna c1ecf75dd5 Update to the latest rev of the picard sharding patch. Includes updates reflecting
the imminent move of IlluminaUtil into picard public.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3493 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-06 20:33:21 +00:00
delangel c503f01dcf More cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3492 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-06 17:41:38 +00:00
delangel d4c66d6191 a) Small cleanup
b) Fix major issue with Beagle likelihood converter: if likelihood triplets from UG end up being too low, then Beagle input file will be produced with 0.00,0.00,0.00 triplet. If all samples at a marker have this issue, Beagle will effectively produce junk. To fix, likelihoods are renormalized before converting to linear space.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3491 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-06 17:31:59 +00:00
depristo cfa18f6743 Fixing missed update with new Allele in it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3490 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 23:56:34 +00:00
depristo 3ea506fe52 No more new Allele() -- must use create. Allelel simple alleles are now cached for efficiency reasons. VCF4 codec optimizations -- 4x performance in general. Now working in general but hooked up to the ROD system now as VCF4. WARNING -- does not actually work with indels, genotype filters, etc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3489 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 23:03:55 +00:00
delangel ef47a69c50 a) First fully functional (sort of) version of walker that parses Beagle imputation output files and produce a vcf with imputed genotypes.
More doc/info to follow shortly. Issues still to be solved:
a) Walker changes all genotypes based on Beagle data, but annotations on the original VCF are unchanged. They should in theory be recomputed based on new genotypes.
b) Current implementation is ugly, dirty unwieldy and will necessitate a refactoring soon so I can keep my pride. Most aesthetically affronting issue right now is that we read the full Beagle files at initialization and keep them in memory, but a more delicate implementation would just read from files on a marker by marker basis. Issue that currently prevents this is that BufferedReader() instances don't seem to play nice when called from the map() function.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3488 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 20:37:25 +00:00
depristo b811e61ae1 Optimized, nearly complete VCF4 reader 2-4x faster than the previous implementation, along with a VCF4 reader performance testing walker that can read 3/4 files, useful for benchmarking
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3487 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 18:11:38 +00:00
aaron 6482b87741 adding the super experimental, half-broken, generally crippled, awkwardly commented, header ignoring vcf4 code. Don't use this, unless you're a developer for VCF4. If so, remove the exception from the constructor so that it won't always exception out.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3486 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 07:38:46 +00:00
aaron 0b03e28b60 updating the tribble library to include the reference dictionary reading / writing. We now check the dictionaries of any tracks that have them against the reference (all new tribble tracks and out-of-date tracks will have this). Also renamed some classes to be more reflective of their function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3485 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 06:34:26 +00:00
hanna 3d055e3d16 Fail fast if users try to parallelize a read walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3484 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-03 18:14:33 +00:00
hanna 7d79848f40 Better error message when bam file / list file with wrong extension is
supplied.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3483 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-03 17:52:48 +00:00
ebanks 597b3744ab Always use phasing info when converting genotypes to strings
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3482 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-03 17:50:50 +00:00
depristo e2b41082af GATK now does automatic adaptor filtering in locus iterators (but not expt. downsampling iterator). General support for LocusIteratorFilters just like read filters but only applying at particular bases. Updated tools with new MD5 sums due to adaptor bases in their integrationtest data. Not that as a side effect here reads close to each other with odd orientations are also filtered out. Updated minor argument to VariantRecalibrator to change the qStep value on the command line
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3481 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 22:26:32 +00:00
aaron 8ec091d6d2 re-enabling regeneration of the tribble index if it's out of date. Also moved the class that can detect text in the log4j stream (useful in testing to make sure appropriate messages are generated).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3480 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 17:45:51 +00:00
asivache f0c379dde8 Unconsequential changes in report formatting
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3479 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 17:43:25 +00:00
weisburd 3ab936181c Supports the join feature of GenomicAnnotator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3478 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 16:29:57 +00:00
weisburd f5f7217413 Implemented joins
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3477 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 16:28:53 +00:00
weisburd 09c3b15af3 Implemented joins
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3476 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 16:28:06 +00:00
weisburd e14ae471a0 Refactored some of the small utility methods
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3475 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 16:26:00 +00:00
weisburd 898a78e97d Added toString()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3474 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 16:24:25 +00:00
weisburd 12c3e3ecda Added back the check for values.size() != header.size(). Now exception will be thrown if number of columns in a record doesn't equal number of columns in the header
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3473 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 16:23:05 +00:00
rpoplin 290771a8c2 Automatic cutting of recalibrated variant calls using ApplyVariantCuts. VariantRecalibrator produces the tranches plot alongside the optimization curve. Specify the levels using -tranche 1.0 -tranche 5.0 etc
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3472 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 15:03:00 +00:00
ebanks 4a555827aa Removing more toUpperCase sanity checks
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3471 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 14:38:39 +00:00
ebanks 56e504789a trivial change: toUpperCase no longer necessary
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3470 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 14:00:47 +00:00
rpoplin 87fe60fe4f Fix for Sendu. new Process and p.waitFor() don't seem to work on his farm. Throws an IOException. This was a problem way back with AnalyzeCovariates too.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3469 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-02 11:37:10 +00:00
ebanks 7f0c638653 Fix for the indel cleaner: I forgot to "unclip" the cigar string (even though the clipped bases were removed) before using it as an alternate consensus in a particular instance.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3468 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-01 02:07:20 +00:00
depristo 21427211c0 Personal MD5 database system now live. WalkerTest now maintains a database of result files associated with MD5 results in integrationtest/, and provides command lines for diff-ing expected to current md5 results when encountering failed intergration tests. The suite currently takes 200Mb to store. Update and run intergrationtest to build your very own expectation database for future development work.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3466 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-31 16:06:16 +00:00
depristo 2b02324587 Support for detecting and automatically excluding reads reading into the adaptor sequence and, if desired, also only showing the first pair when two reads overlap in the fragment. Not enabled, an intermediate check in before updating and verifying the impact on locus walkers everywhere.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3465 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-30 18:00:12 +00:00
ebanks eb25e41111 minor update to new tribble name
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3462 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 20:23:25 +00:00
ebanks ffeb3fd80d Thanks to Guillermo, I found a bug in the Unified Genotyper output: GL was posteriors instead of likelihoods. Not a huge deal because the
priors were flat, but fixed nonetheless.
Also, needed to update Tribble.
Minor updates to the Beagle input maker.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3461 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 19:28:26 +00:00
rpoplin 4e268ef6ac Removing the Variant Recalibration Performance test because it isn't ready yet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3460 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 18:27:25 +00:00
rpoplin 522dd7a5b2 Adding the variantrecalibration classes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3459 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 18:21:27 +00:00
rpoplin 2014837f8a VariantOptimizer package is moved to core, renamed as VariantRecalibration, and added to the binary release package. VariantOptimizer walker is renamed to GenerateVariantClustersWalker and ApplyVariantClustersWalker renamed to VariantRecalibrator. Integration tests added, performance tests still to be done.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3458 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 18:20:18 +00:00
aaron 871cf0f4f6 Call out ROD types by there record type, instead of the codec type (which was clumsy). So instead of:
@Requires(value={},referenceMetaData=@RMD(name="eval",type= VCFCodec.class))

you'd say:

@Requires(value={},referenceMetaData=@RMD(name="eval",type= VCFRecord.class))

Which is more in-line with what was done before.  All instances in the existing codebase should be switched over.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3457 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 14:52:44 +00:00
depristo cc2bf549c8 Removing my unnecessary optimization. 10 lines later in the code the same optimization was applied. A monumental waste of time.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3455 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 14:10:48 +00:00
aaron a4d834cc01 fixing the test I broke
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3454 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 02:06:20 +00:00
depristo 6485e8383d Trivial change to retrigger broken build that really isn't broken
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3453 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-27 23:33:46 +00:00
depristo f2e7582cfc Reorganization of SW code for clarity. Totally failure at raw optimization. Discovered that ~50% of reads being cleaned were perfect reference matches. New code comes with flag to look at NM field and not clean perfect matches. Can we turned off with command line option (needed for 1KG bams with bad NM fields). Going to rerun cleaning jobs due to accidentally rebuilding of stable codebase and loss of 2 days of runtime.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3452 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-27 23:16:00 +00:00
aaron e1b0aefb29 fix for parallelism bug
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3451 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-27 22:16:14 +00:00
aaron cded9ec985 adding a command line option, -etd (enable threaded debugging), that uses a custom thread pool class to catch exceptions thrown inside of a thread.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3450 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-27 21:57:56 +00:00
ebanks e2674671e7 The liftover code needs to *hard filter* records whose reference changes (since they no longer adhere to the VCF spec as they don't match the new reference - and can't be converted to VariantContext).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3448 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-27 19:22:47 +00:00
chartl ff4a0764df Read error rate is now parallelizable
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3447 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-27 19:00:09 +00:00
depristo dfc36c1e95 Restructuring of the mandatory read filters for traversals. Now everything uses ReadFilters, even for the required filters like being mapped for LocusWalkers. Statistics now tracked for each read filter used during the traversal and info emitted in INFO at the end.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3445 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 22:12:25 +00:00
delangel 3873dccb35 First fully functional (though preliminary) version of walker that takes an input VCF and outputs a Beagle .bgl file that can be used for missing genotype calls/haplotype imputation. For now, only supported input format is likelihood format for unrelated individuals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3444 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 21:03:23 +00:00
chartl f9efc1248c VariantEvalWalker now takes indels if you throw the -dels flag. IndelLengthHistogram appears to be working properly, it is turned off by default (as it is experimental) but you can turn it on in your own repository.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3443 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 20:03:14 +00:00
ebanks 058441fa39 Trivial renaming of test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3441 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 16:56:42 +00:00
chartl 0265199ce4 First pass at an IndelLengthHistogram module for variant annotator. Off by default. Will be tested shortly (have to commit, so I can check out in another directory, so that compiling won't kill all my jobs running on LSF)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3440 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 15:04:39 +00:00
aaron a2fab07258 fixed the build problem: there were two copies of the AnnotatorInputTable Codec and Feature in two different spots.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3439 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 14:47:15 +00:00
depristo 5928047d8b Optimization of reference window calculation to us bytes not char and no uppercasing since reference and read bases are always uppercase now. Should remove some ~5% of runtime of UG.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3438 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 14:10:26 +00:00
chartl 88a06ad81f Changes to Depth of Coverage:
- For speedup in large number of samples, base counts are done on a per read group level, then
   merged into counts on larger partitions (samples, libraries, etc)
   + passed all integration tests before next item
- Added additional summary item, a coverage threshold. Set by (possibly multiple) -ct flags,
   the summary outputs will have columns for "%_bases_covered_to_X"; both per sample, and
   per sample per interval summary files are effected (thus md5s changed for these)

NOTE:

This is the last revision that will include the per-gene summary files. Once DesignFileGenerator is sufficiently general, and has integration tests, it will be moved to core and the per-gene summary from Depth of Coverage will be retired.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3437 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 03:39:22 +00:00
ebanks 0607f76a15 commenting out this test until I can figure out what the hell is going on with the codecs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3436 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 01:12:10 +00:00
rpoplin 062b316881 Better Exception message when can't find annotation value in variant recalibrator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3434 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-25 21:15:50 +00:00
rpoplin bf530d23de Variant Recalibrator now makes use of a prior on known/novel status as well as on allele frequency spectrum. The VariantOptimizer walker now clusters with all variants but gives more weight to knowns / hapmap / 1KG / MQ1 sites. The weights are all optional command line arguments. We no longer assign default values to annotations that are malformed. The walkers will crash with exception so as to not cover up potential issues. We only produce titv-less clusters now, and so the titv argument in VO was removed and the WithoutTiTv string that gets added to the cluster file is removed. The wiki is updated to show new example commands.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3433 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-25 21:08:31 +00:00
ebanks ae6c014884 Fixed UG parallelization bug. Better integration test to catch this in the future.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3432 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-25 21:03:45 +00:00
ebanks 434e920da9 Oops, forgot to update integration tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3431 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-25 20:37:45 +00:00
ebanks 772f558ae0 Massive change to the indel realigner code. We now properly deal with soft-clipped reads. Also, improved left-alignment code.
Small change for Ryan to get hard-clipped reads working for the recalibrator.

PLEASE DO NOT RELEASE THIS WEEK.  I still have some more testing to do and need Mark to run WG jobs.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3430 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-25 20:04:33 +00:00
aaron f3e2aae570 add experimental support for tabix files (for any of our Tribble rod types), as long as they end in .gz and can be read by the tabix reader.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3429 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-25 04:44:46 +00:00
weisburd 8db7c97c4d Moved AnnotatorInputTableFeature and Codec to org.broadinstitute.sting.gatk.refdata.features.annotator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3427 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-24 14:38:54 +00:00
weisburd 4aa749c709 Moved AnnotatorInputTableFeature and Codec to org.broadinstitute.sting.gatk.refdata.features.annotator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3426 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-24 14:38:07 +00:00
weisburd aca3bcb193 Moved AnnotatorInputTableFeature and Codec to org.broadinstitute.sting.gatk.refdata.features.annotator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3425 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-24 14:37:17 +00:00
weisburd 64ed770250 Moved AnnotatorInputTableFeature and Codec to org.broadinstitute.sting.gatk.refdata.features.annotator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3424 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-24 14:36:28 +00:00
hanna ee3f2eb1d0 Don't output traversal reduce result in the logger. In many cases, the reduce
result is tangential to the product of the analysis and having the logger always
emit it can confuse the output (such as in the new reduceByInterval 
DepthOfCoverage walker).  If users want to emit it, they can choose not override
onTraversalDone, or override onTraversalDone and write results to the output
stream / logger / whatever their choice.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3422 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-23 22:41:43 +00:00
hanna a40e64e47b A downsampling validator. Compares the generated pileup passed in from the alignment context to the reads,
passed in as a Tribble SAM text feature.  If the generated pileup contains a valid set of reads according to
the downsampling rules, the test passes.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3421 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-23 21:49:54 +00:00
delangel a280a0ff0d a) Made HaplotypeScore default annotation. This changed several integration tests, whose MD5 is now updated.
b) Disabled BaseQualRankSumTest, the returned p-values differ wildly from Matlab/R-provided ones, cause TBD.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3419 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 22:25:17 +00:00
hanna b10950c691 Simple performance optimization -- cache the number of reads in the locus hanger.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3417 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 19:26:16 +00:00
delangel 355396109b Bug fix to avoid build failure (class changed under me??)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3416 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 18:48:56 +00:00
delangel 1753d07b02 Added AnnotationByAlleleFrequencyWalker - walker takes an input vcf, a reference vcf and a list of annotations (with the -A argument). For each site present in both VCF's, it outputs the given annotations into the screen as well as allele frequency. Since HapMap vcf reference doesn't include AF in annotations, it computes it from Chromosome, Het and HomVar counts.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3415 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 18:31:34 +00:00
chartl 745d7c582f added integration test for intervals with no coverage due to filtering
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3414 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 16:52:42 +00:00
chartl 7fb3f2d3eb Annotator now buffers indel calls (prevents double-output from double-calls to map)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3413 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 16:34:34 +00:00
chartl 4e834b5e35 VFW now uses a ref window and thus is compatible with indels.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3412 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 15:59:42 +00:00
chartl 88cb93cc3c Changes to Depth of Coverage (added maximum base and mapping quality flags; with new integration tests -- because they use b36, and the other test uses hg18, it's in a different class (integration test system can't change refs on the fly). Initial change to VariantAnnotator to allow it to see extended event pilups; you currently have to throw the -dels flag; and it's specified as "very experimental". Yet,all the integration tests pass.
Homopolymer Run now does the "right" thing (e.g. single bases are represented as HRun = 0 rather than HRun = 1) for indels. AlleleBalance now does something close enough to correct.

Added a convenience method to VariantContext that will return the indel length (or lengths if a site is not biallelic).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3409 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 13:02:01 +00:00
depristo 6faf101c6c Minor improvements to Callable Loci for public consumption
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3408 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 12:50:11 +00:00
hanna 388dd8d64d Fixing bugs in downsampler introduced when I added Ryan's dup eliminator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3407 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 02:53:12 +00:00
depristo a10fca0d5c Genotyper now is using bytes not chars. Passes all tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3406 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 21:02:44 +00:00
hanna 7389077b3b A few misc usability fixes:
- Clarify the message emitted when -XL is supplied so I don't spend another half day chasing a bug that doesn't exist.  
- Crash with a helpful message when running -nt with non-TreeReducible walkers.
- Crash with a helpful message when running -nt with reduceByInterval walkers.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3405 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 19:02:02 +00:00
aaron b543dd4ac4 more aggressive checks for the locking, and some more documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3404 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 16:16:36 +00:00
depristo 1ab00e5895 Retiring multi-sample genotyper
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3401 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 14:10:56 +00:00
depristo 727822adb4 BaseUtils has more clear distinction between byte and char routines. All char routines are @Depreciated now. Please use bytes. Better organization of reverse(), now in Utils not BaseUtils.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3400 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 14:05:13 +00:00
depristo 6ce3835622 Removing unused methods in QualityUtils; ReferenceContext now converting all bases to upper case, but can be disabled with static boolean
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3399 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 12:38:06 +00:00
depristo 5abac5c057 A few more char -> byte cleanups
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3398 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 00:02:06 +00:00
depristo 8a725b6c93 Restructuring of ReferenceContext and ReadWalkers to accept a ReferenceContext. Now ReferenceContext is byte[] backed not char[]. Please no more chars for the reference. All of the tests pass now. Coming check-ins are going to clean up the char / byte problems in the GATK
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3397 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 23:27:55 +00:00
aaron 02cc1afdc8 remove RodBed and all it's dependencies.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3396 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 19:12:30 +00:00
chartl ffb1b46166 Added a GCCalculatorWalker for a oneoff analysis for Mark Daly (GC content of agilent 1.1 targets)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3395 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 18:49:51 +00:00
aaron 0036df7b03 adding a convenience method for getting at the RODs that overlap a specific locaiton as GATKFeatures.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3394 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 17:40:20 +00:00
aaron ca386439be only emit a warning if the tribble index is out of date, don't remove and replace it for them. Added a test case where the log4j appender checks the logging messages for the appropriate output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3393 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 15:12:48 +00:00
hanna 017ab6b690 Experimental versions of downsampler and Ryan's deduper are now available either
as walker attributes or from the command-line.  Not ready yet!  Downsampling/deduping 
works in a general sense, but this approach has not been completely optimized or validated.
Use with caution.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3392 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 05:40:05 +00:00
weisburd 46ba88018d Updated to the new readHeader(..) api
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3391 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 04:06:34 +00:00
weisburd 984c51efd3 Updated to use Tribble-based GATKFeature instead of TabularROD
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3390 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 03:42:12 +00:00
weisburd 42ee16f256 Updated to use Tribble-based GATKFeature instead of TabularROD
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3389 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 03:41:37 +00:00
weisburd d8469e2fba Updated to use Tribble-based GATKFeature instead of TabularROD
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3388 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 03:40:47 +00:00
weisburd d65b2d32d1 Removed AnnotatorROD which has been ported to Tribble
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3387 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 03:39:34 +00:00
weisburd b82116f488 Removed AnnotatorROD which has been ported to Tribble
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3386 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 03:39:20 +00:00
weisburd 6b96f025f5 Tribble integration for indexing the AnnotatorInputTable format
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3385 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 03:37:54 +00:00
weisburd 2f3933148d Added fast split(str, delimiter) methodf
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3384 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 03:37:26 +00:00
hanna aedb9f6734 Bring SAMPileupCodec into compliance with new interface.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3383 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 01:23:29 +00:00
aaron 7cfb9ff3dc updates for Tribble 82, fixes for Ryans case where multiple processes would attempt to read/write to the same index, and a couple other Tribble-centric bug fixes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3382 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 19:34:45 +00:00
chartl 635f61c22d Clone the other guy too
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3381 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 18:56:01 +00:00
rpoplin 9e15299475 Misc cleanup in variant recalibrator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3380 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 17:37:01 +00:00
chartl eb200e4cce Hrumph. Don't just add pointers to the same objects, actually clone the underlying arrays.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3379 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 17:13:44 +00:00
chartl e016491a3d Major refactoring of Depth of Coverage to allow for more extensible partitions of data (now can do read group, sample, and library; in any combination; adding more is fairly easy). Changed the by-gene code to use clones of stats objects, rather than munging the interval DoCs. (Fix for Avinash. Who, hilariously, thinks my name is Carl.) Added sorting methods to ensure static ordering of header and body fields.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3377 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 16:58:13 +00:00
weisburd 3c022e4b0c Improved command-line-arg validation at startup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3374 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 02:46:17 +00:00
weisburd 35b4bba35e Refactored so it could be used for knownGene and CCDS as well as refGene
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3372 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 02:44:10 +00:00
weisburd bb86c0e03a Improved error message
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3371 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 02:43:13 +00:00
weisburd 68719615be For multiple matches, shifted counter to be 1-based
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3370 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 02:41:50 +00:00
hanna 73e2e32837 Fix typo.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3369 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-17 21:04:00 +00:00
chartl ebd0fabf86 First pass updates to annotations to work with indels. HomopolymerRun indel behavior is currently turned off by a global boolean until it's ready to go live.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3368 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-17 21:02:13 +00:00
hanna 0791beab8f Checking in downsampling iterator alongside LocusIteratorByState, and removing
the reference implementation.  Also implemented a heap size monitor that can
be used to programmatically report the current heap size.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3367 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-17 21:00:44 +00:00
chartl b7d21627ab Changes to DepthOfCoverage (JIRA items) and added back an integration test to cover it. Alterations to the design file generator to output all transcripts (rather than choosing one at random).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3366 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-17 17:23:00 +00:00
kiran 4235164359 Removed the confusionMatrix column (of *course* this is a confusion matrix... what else would it be?!).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3365 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-14 21:55:37 +00:00
kiran 95b29f608b Specify default values.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3364 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-14 21:42:53 +00:00
rpoplin 6efd05831b Encapsulating annotation decoding function in order to use same fixed random seed in both VariantOptimizer and ApplyVariantClusters
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3363 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-14 20:03:38 +00:00
ebanks 32389dc0a9 Fixed GQ estimate when chosen genotype isn't the most likely according to the GLs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3362 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-14 19:17:46 +00:00
depristo 1538dc0144 optimizer now uses -an arguments instead of exclude and force for clarity. command-line length reduced by 50%
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3361 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-14 15:41:44 +00:00
hanna 88bd7a2045 Reenabling UG parallelization performance tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3360 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-13 16:28:08 +00:00
hanna 0490909285 Fixed epic generic paths fail.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3359 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-13 15:59:57 +00:00
hanna 7ef87e5126 An integration test based on validating pileup to test parallelism in reads, reference, and RODs. This test runs in less
than a minute and fell over instantly in the case of the Tribble parallelism issue.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3358 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-13 15:40:43 +00:00
hanna ceec525420 Got rid of stray unicode characters in copyright message.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3357 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-13 14:47:39 +00:00
hanna 3e9ad4bbd0 Porting SAM pileup ROD to Tribble as a case study.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3356 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-13 00:22:59 +00:00
aaron 6839c194cb although holding on to memories can be fun, it's bound to hurt performance.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3355 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-12 19:26:58 +00:00
ebanks c81b910f73 Commenting out the parallelization test which is failing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3354 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-12 18:39:53 +00:00
aaron cac98ba5ef a couple of small documentation fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3353 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-12 17:40:27 +00:00
depristo 3f07611187 Added support for -nSamples to varianteval (and getNSamplesForEval function). Allows you to calculate AC based metrics for files without genotypes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3350 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-12 13:36:31 +00:00
aaron 2c55ac1374 fixes for parallel processing problems with Tribble, a small bug in the resource pool, and some more documentation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3349 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-12 06:13:26 +00:00
hanna 6868ce988f Fix hanging bug reported by Susanne Pfeifer (tiffy @ get satisfaction) where, if the last read(s) in a shard all have an
indel in roughly the same location and that indel isn't covered by any other reads, LocusIteratorByState goes into an infinite
loop.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3348 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-11 17:31:19 +00:00
ebanks 34969f304c Adding dbsnp to all UG performance tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3347 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-11 15:48:05 +00:00
ebanks 140e43b93b Checking in to see whether it fails. If I start getting bombarded with Bamboo error reports, I'm commenting it out...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3346 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-11 15:39:42 +00:00
ebanks 572b383fe2 Make VA annotate dbsnp again
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3345 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-11 14:06:53 +00:00
rpoplin b09e7231d1 A quick implementation of the experimental covariates for the TGen folks to work with.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3344 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-11 01:08:52 +00:00
kiran aec5f7b630 Can now threshold results based on minimum base and/or mapping quality.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3343 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-10 19:58:07 +00:00
kiran 13fd182b7c For dealing with slightly malformatted BAMs - mark every alignment as primary, or in the case of some BAM files from UWash, supply the sample information for each read group.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3340 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-10 15:17:05 +00:00
kiran 4a7902bb8e Bases 'A' and 'a' (etc.) no longer considered different.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3339 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-10 14:53:38 +00:00
kiran ec543b7b62 The Complete Genomics confusion matrix rates.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3338 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-10 14:52:10 +00:00
kiran b223b04331 Don't list '.' as an alternate allele, dummy!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3337 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-10 14:51:18 +00:00
kiran 98718d0faa Computes the error rate per cycle
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3336 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-10 14:50:22 +00:00
kiran 7527f950d1 Computes the quality score distribution per readgroup (one column per readgroup)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3335 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-10 14:49:38 +00:00
kiran c111c15072 Computes the distribution of insert size per library (for now, one output file per library)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3334 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-10 14:48:35 +00:00
ebanks a51bd57566 First version of the smart batch merging tool.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3333 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-10 02:18:48 +00:00
rpoplin 33a9549896 Variant Optimizer accepts a dbSNP rod arugment to use in determining known/novel status as opposed to using the rsID in the vcf record. VO generates plots of annotation values used in clustering broken out by knowns and novels. Useful for showing which annotations are approximately Gaussian.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3332 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-09 16:48:07 +00:00
hanna 76efa757f0 Switched over to reviewed version of Picard patch. In process, did some optimization to the IntervalSharder
which improved startup time 5-10x when dynamically merging many BAMs.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3331 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-08 14:12:22 +00:00
depristo 504103bd15 Misc. additions to correct utilities
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3329 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-07 21:34:18 +00:00
depristo 64ccaa4c6a Walkers and integration tests that calculate and compare callable bases
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3328 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-07 21:33:47 +00:00
depristo d070554329 A walker that calculates read lengths, number and size of clipping events
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3327 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-07 21:32:51 +00:00
chartl 1749a49042 Mapping and base quality thresholds for DoC default to none
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3326 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-07 18:08:13 +00:00
aaron 7d2df3f511 example windowed ROD walker for Kristian, and updates to Tribble
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3325 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-07 17:12:50 +00:00
rpoplin 57f254b13a VE integration test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3324 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-07 13:58:25 +00:00
ebanks 44de92e09d Checking in the liftover script. I am including a post-processing walker to filter out bad records written in under 10 minutes as per my agreement with Mark.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3321 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-07 12:31:56 +00:00
ebanks 18f1d31a22 Moving to and organizing in core.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3320 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-07 04:05:36 +00:00
aaron 06ea65e60b again for JIRA GSA-320
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3319 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-07 03:47:58 +00:00
aaron ac9b32db88 a bug fix for Kiran; putting JIRA in for better type determination system for the new Tribble tracks so this doesn't happen again.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3318 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-07 03:31:43 +00:00
hanna 4e0019b04f Repair code that sorts and merges intervals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3317 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-06 22:37:25 +00:00
aaron 72e030a670 require that snps be biallelic before we pass them to the TiTv calculation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3316 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-06 22:33:00 +00:00
rpoplin 7cecec7d00 Removing zero no-calls restriction in AC stats
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3314 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-06 18:55:07 +00:00
ebanks 0e58fb7cc0 Moved over to be a walker inside the GATK
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3313 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-06 18:28:03 +00:00
aaron 78409dca0d turned off the progress output from tribble when making an index, and fixing a case where the index file isn't writable so we instead make the index in memory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3312 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-06 16:36:58 +00:00
ebanks bacc507a48 Don't worry about sorting anymore in the liftover tool. That will come later.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3311 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-06 15:00:30 +00:00
ebanks 5df0361bd2 trivial removal of unnecessary comments
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3309 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-06 03:51:14 +00:00
ebanks 2975e3a4e8 picard Intervals don't sort right - switching to GenomeLocs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3308 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-06 03:50:28 +00:00
ebanks 1a99fb9318 First pass at liftover tool. Passing buck over to Aaron...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3306 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 20:38:19 +00:00
aaron a0d71540df speed-up for VCF, adding code to the VCF reader to automagically make an index if one doesn't already exist, and a change to the VCF writer unit test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3305 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 20:19:42 +00:00
aaron 6bbcc47b5d removing some out-of-date RODs and some unused genotype writer formats
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3304 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 19:07:13 +00:00
aaron c998c48a23 adding code to detect out-of-date index files, which we now remove and regenerate if the target file is newer than the index file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3303 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 17:55:36 +00:00
aaron a68f3b2e9c VCF moved over to tribble.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3302 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 17:28:48 +00:00
aaron ad11201235 adding more ROD pile-up tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3301 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 16:01:11 +00:00
asivache 0338345bee Fixing the issue with reads having insertion immediately followed by a S/H cigar element causing out of window error.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3300 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 15:42:27 +00:00
ebanks 64640d6b17 Complete the switch statement to deal with all possible cigar operators for Kris.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3299 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 13:41:05 +00:00
aaron f75e54e3f7 fixes for new package names in tribble 74
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3298 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 05:47:04 +00:00
chartl 617542853f Walker that can be used with refGene and a TCGA bed file to annotate intervals in an interval list with the genes and exons they overlap.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3296 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 02:55:01 +00:00
chartl 354262eabe New convenience methods to rodRefSeq for dealing with intervals that may be a superset of multiple exons. Needed for next commit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3295 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 02:54:18 +00:00
ebanks 03bea70f3a Fixed edge case bug in cleaner: when no -L argument is used and a target interval abuts the end of the reference genome, we'll NullPointer at the first unmapped read.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3293 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-04 16:49:21 +00:00
kiran 510b3efcc2 Fixed an issue where asking for the alternate alleles at hom-ref sites would result in an array out-of-bounds exception.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3292 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-03 18:46:33 +00:00
sjia 94b51de401 HLA caller updated to examine class II loci, updated pointers to dictionary, allele frequencies.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3290 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-03 14:54:52 +00:00
rpoplin 97fdd92e7b Clean up the code to have a unified approach to calculating p(true) for both with and without ti/tv models
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3289 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-03 13:30:20 +00:00
aaron f497213933 DbSNP moved over to tribble
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3288 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-03 06:02:35 +00:00
rpoplin 9d01670f62 Major update to the Variant Optimizer. It now performs clustering for both the titv and titv-less models simultaneously, outputting the cluster files at every iteration. It makes use of the Jama matrix library to do full inverse and determinant calculation for the covariance matrix where before it was using only approximations.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3286 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-02 19:21:23 +00:00
weisburd a318b1871d Removed unused column
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3285 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-30 21:29:34 +00:00
ebanks 9dff578706 Added PG tag to bam header to let people know it's been cleaned.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3284 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-30 17:30:30 +00:00
ebanks 0e10359a5e Okay, finished up the ability to cap a base's qual by its read's mapping quality.
This is experimental - I have not tested its performance on SNP calling, or even played around with it.  If you want to test it out, go nuts.  But don't come running to me if your results are not good.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3282 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-30 16:58:30 +00:00
ebanks 850f36aa61 Changes to the Unified Genotyper's arguments:
1. User can specify 4 confidence thresholds: for calling vs. emitting and at standard vs. 'trigger' sites.
2. User can cap the base quality by the read's mapping quality (not done yet).
3. Default confidence threshold is now Q30.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3281 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-30 16:44:24 +00:00
weisburd 8b2ce128b5 Optimized the join(..) method.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3280 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-30 15:55:07 +00:00
hanna 8bb15ef812 Checking in the reference implementation of the downsampler for back comparison.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3278 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-30 15:41:13 +00:00
ebanks 1714c322c2 Reorg of UG args; checking in first before upcoming changes that will break integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3274 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-30 14:48:46 +00:00
weisburd ba78d146ec Finished implementing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3273 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-30 14:14:31 +00:00
weisburd 5d5c7f9d34 Changed short code of stop codon to 'stop'
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3272 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-30 13:55:52 +00:00
aaron cbed0b1ade Adding GeliText tribble track as the first enabled Tribble track. This mean 'Variants' is no longer valid for a ROD type, use GeliText instead. I've updated all the references in the codebase.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3271 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-29 22:50:17 +00:00
aaron 7fbfd34315 adding the GELI ROD validation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3270 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-29 21:43:00 +00:00
chartl 82818a417b Allow header fields to come in any order...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3269 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-29 18:33:10 +00:00
hanna 4617abf1ff Fix bug in the interval sharder in cases where contigs specified in intervals are not present in any supplied BAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3268 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-28 20:42:04 +00:00
chartl e2ff4167af Added "#Family ID" as a possible header value for PlinkRod ... since that's in the new sequenom headers for pilot 3 validation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3266 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-28 18:38:33 +00:00
depristo 5dce16a8f1 Better genotype concordance module. Code refactoring for clarity (please see below/after for educational purposes). Now reports variant sensitivity, concordance, and genotype error rate by default. Also aggregates this data across all samples, so you get a per sample and overall stats for each of these in the allSamples row.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3265 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-28 13:10:11 +00:00
aaron 64c5f287c5 fixes for edge-cases when using reflections to find classes outside of the main jar. Will push as a patch to reflections
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3264 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-27 17:46:46 +00:00
aaron c647153b10 Adding Jama for Ryan.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3262 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-27 14:30:36 +00:00
aaron f6468f9143 a fix for a bug we've worked around in the reflections package: previously it didn't find classes that weren't in the main jar. Fixed in this version.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3261 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-27 04:49:49 +00:00
ebanks df31eeff9f minor change
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3259 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-26 06:05:29 +00:00
aaron 68bdac254b a utility walker for validating changes made to the underlying ROD system in the transistion to Tribble.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3258 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-26 05:21:24 +00:00
ebanks d9bf441391 Have UG emit calls at sites from one or more 'trigger' tracks when provided
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3257 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-26 05:04:43 +00:00
ebanks 8f2bfac7a6 Bug fix for NullPointerException
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3256 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-26 05:02:09 +00:00
ebanks f5a3b128c8 Fixing bug that's not caught by integration tests:
If the first eval seen has one or more no-calls, then that's the 2N chromosome count that gets set as the max for the metrics.  Instead, just check that any eval's no-call count is 0.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3255 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-26 02:40:34 +00:00
depristo 29ab59a7b3 Bug fix for Kiran; insertions now get a null reference allele even if the ref input object is null
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3254 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-24 21:31:03 +00:00
aaron c8d09a29ed some quick changes to the VE output system - more to come.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3253 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-23 21:55:08 +00:00
depristo 7f4d5d9973 Ti/Tv by AC
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3252 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-23 17:56:29 +00:00
ebanks 42bcca1010 Pulling out the left-alignment code for indels so that other walkers can use it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3251 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-23 16:12:34 +00:00
weisburd 9e28e4eb42 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3250 348d0f76-0448-11de-a6fe-93d51630548a 2010-04-23 15:50:09 +00:00
weisburd 10bcd72593 1st attempt to implement extra columns
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3249 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-23 15:49:37 +00:00
weisburd a72a5a7b1a Data object for representing a single amino acid
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3248 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-23 15:49:06 +00:00
rpoplin e7c0ded40e Fixed long-standing bug in GenotypeConcordance module of VariantEval which caused incorrect numbers to be displayed in the concordance table. The format of the concordance table has changed. Added a concordance summary table which gives overall genotype concordance summary stats by sample. None of the VE integration tests contained genotype information so I added a comp track with genotypes to one of the tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3247 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-23 15:48:41 +00:00
ebanks e0b51d0df0 Trigger cleaning of duplicate reads. Also beeter debug output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3246 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-23 15:12:28 +00:00
ebanks 3adf7fbf64 bug fix for known-indels used as consenses
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3245 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-23 13:52:51 +00:00
aaron f050beada6 make sure we do delete the temp file we create
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3244 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-23 05:32:49 +00:00
aaron 536f22f3bd adding VC adaptor for GELI, along with unit tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3243 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-23 05:28:39 +00:00
depristo 3d2c836db6 Bug fix for case sensitivity
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3242 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-23 03:08:58 +00:00
ebanks 8c94df6f00 Bug fix for Chris: deal with sites that have "semi-deletions"
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3241 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-22 18:34:41 +00:00
chartl 121163dd49 interim commit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3240 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-22 13:44:45 +00:00
weisburd f0fe2ea530 A simple codon -> AA lookup table
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3239 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-22 12:18:00 +00:00
weisburd e643a9e7a5 Takes a refGene table ( -B arg must be: -B refgene,AnnotatorInfoTable,/path/to/refgene_file.txt) and generates the big table of nucleotides containing annotations for each possible variant at each transcript position (eg. 4 variants for each position).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3238 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-22 12:11:19 +00:00
weisburd 653e08c0b6 Takes a refGene table ( -B arg must be: -B refgene,AnnotatorInfoTable,/path/to/refgene_file.txt) and generates the big table of nucleotides containing annotations for each possible variant at each transcript position (eg. 4 variants for each position).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3237 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-22 12:11:03 +00:00
weisburd 20379c3f82 Added location-caching optimization, temporary attributes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3236 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-22 11:35:45 +00:00
ebanks 84ebceb9a6 Fix for Chris: need to use the appropriate conversion method. Added a warning to the adaptor.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3235 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-22 02:05:10 +00:00
chartl e7334ec11f Checkin for Eric (IndelDBRateWalker is a prelude to a VariantEval module for comparisons for indels)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3234 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-22 00:40:27 +00:00
hanna 32d86cf457 Rev the reservoir downsampler to support partitioning through a functor.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3232 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 19:50:26 +00:00
asivache ef6d900eb8 for now, set log error to -1
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3231 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 19:21:06 +00:00
ebanks e9e844fbf5 1. Reverting: dbsnp automatically is a comp
2. Fixing logic for min Qscore calculation


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3230 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 18:51:35 +00:00
asivache 532263ea25 Oooops, forgot to update the test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3229 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 18:38:24 +00:00
asivache 1373fee278 Because of the ugly VCF format, generic addCall() method of GenotypeWriter interface acquired an additional parameter, explicitly specified reference base (in VCF it's the base immediately *before* the event in case of indels, so we got to pass it). All implementing classes are modified to accomodate the change.
VCFGenotypeWriterAdapter now explicitly uses the passed reference base instead of deriving it from VatriantContext (in SNP mode as well!), other writers simply ignore that additional argument. 

SimpleIndelCalculationModel now WORKS (or rather, it does produce calls :) )

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3228 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 18:19:03 +00:00
hanna ab34397d2e Continuing to stamp out the non-ASCII copyright virus.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3227 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 14:50:45 +00:00
chartl 84f1ccd6ac Two dumb oneoff walkers written to fix & annotate the Baylor indel calls (which came in sans reference, and without coding/intron annotations).
ERIC -- does the IndelAnnotator (the RefSeq lookup code I stole from IndelGentoyperV2) want to be its own Annotation inside VariantAnnotator? Is Andrey already doing this as part of adding indel calling to UG?



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3226 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 14:04:10 +00:00
depristo 2fdc1cf490 Bed ROD track support
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3225 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 13:22:42 +00:00
depristo 51b3998082 deleting unused code from VariationFiltration
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3224 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 13:22:19 +00:00
ebanks 4abd3b0b7b Fixing known/novel calc now that dbsnp isn't a default comp track
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3223 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 05:43:59 +00:00
ebanks 114819d980 Allow user to set min confidence score for comp tracks too
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3222 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 05:09:09 +00:00
ebanks 3db73e0791 Renaming for consistency
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3221 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 03:00:43 +00:00
ebanks 3b5673d967 1. Removed -all; by default all modules are used; use -none for no modules.
2. Don't make dbsnp track be a comp by default (to cut back on output). Please let me know if someone wants this back for some reason.
3. Cleaned up dbsnp module output to print the right numbers.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3220 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 02:46:42 +00:00
aaron 4e18c54bb8 fixing a couple of commented out portions of the VCFReader test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3219 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 22:20:35 +00:00
asivache 6fda78f93f Always return deleted bases in upper case
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3218 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 19:17:40 +00:00
asivache 52a570637d Always keep event bases in upper case
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3217 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 19:16:39 +00:00
aaron 80c4f88a72 removing the Variation interface.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3216 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 18:56:45 +00:00
asivache 7d952a34ae Fixing copyright note
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3215 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 18:28:57 +00:00
asivache cdc175f7e3 Synchronizing version to make sure everything compiles; this model is not operational yet
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3214 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 17:41:52 +00:00
asivache 4437456bb5 Pass array of ref bases to callExtendedLocus()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3213 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 17:41:13 +00:00
asivache 5d2fab93f4 Method signature changed: for extended events, pass array of reference bases (to ensure we cover the full length of the indel event), not just reference base.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3212 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 17:40:30 +00:00
asivache 01e6492ba9 Updated to work correctly with extended pileups. Clogged and uses some dirty tricks; pileups/extended pileups need to be redesigned someday
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3211 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 17:38:09 +00:00
asivache 4723cad1be New method: getBasesAtLocus(int n); for the windowed reference context, this method extracts n bases starting at the current locus (NOT at the window start, so this method is an extension of getBase())
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3210 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 17:35:09 +00:00
asivache cac125b35c Fixed incorrect symbol printed into the output file (tag had 'R', should have had 'T')
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3209 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 16:37:28 +00:00
rpoplin f4977965b6 Removing debug statements
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3208 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 16:22:40 +00:00
rpoplin 124b7a2a58 Moved ApplyVariantClusters over to VariationContext
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3207 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 16:20:25 +00:00