aaron
f4cfb0f990
The first step in integrating Jim's tree based index scheme:
...
- changed to a better method for getting headers from Codecs
- some removal of old commented out code in the GATKAgrumentCollection
- changes for the rename of FeatureReader to FeatureSource
- removed the old Beagle ROD
- cleaned up some of the code in SampleUtils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3826 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-19 04:49:27 +00:00
delangel
55b756f1cc
First step in major cleanup/redo of VCF functionality. Specifically, now:
...
a) VCF track name can work again with 3.3 or 4.0 VCF's when specifying -B name,VCF,file. Code will read header and parse automatically the version.
b) Old VCF codec is deprecated. Reader goes now direct from parsing VCF lines into producing VariantContext objects, with no intermediate VCF records. If anyone can't resist the urge to still input files using the old method, a new VCF3Codec is in place with the old code, but it will be eventually deleted.
c) VCF headers and VCF info fields no longer keep track of the version. They are parsed into an internal representation and will be output only in VCF4.0 format.
d) As a consequence, the existing GATK bug where files are produced with VCF4 body but VCF3.3 headers is solved.
e) Several VCF 4.0 writer bugs are now solved.
f) Integration test MD5's are changed, mostly because of corrected VCF4.0 headers and because validation data mostly uses now VCF4.0.
g) Several VCF files in the ValidationData/ directory have been converted to VCF 4.0 format. I kept the old versions, and the new versions have a .vcf4 extension.
Pending issues:
a) We are still not dealing with indels consistently or correctly when representing them. This will be a second part of the changes.
b) The VCF writer doesn't use VCFRecord but it does still use a lot of leftovers like VCFGenotypeEncoding, VCFGenotypeRecord, etc. This needs to be simplified and cleaned.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3813 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 22:49:16 +00:00
chartl
75bea4881a
Modified SampleFilter to allow for multiple samples to be given. AminoAcidTransition now turns on when you give VariantEval the right commands.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3812 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-16 21:27:32 +00:00
ebanks
af23762778
Removing more references to VCFRecord
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3789 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 11:54:23 +00:00
ebanks
460283f6d2
No more manually converting VariantContexts to VCFRecords. You should be utilizing VCs and not VCFRecords.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3787 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 05:21:28 +00:00
ebanks
6b5c88d4d6
The GATK no longer writes vcf3.3; welcome to the world of vcf4.0. Needed to fix a few output bugs to get this to work, but it's looking great. Much more still to come. Guillermo: hopefully this doesn't break your local build too badly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3786 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-14 04:56:58 +00:00
chartl
9d2a485532
Update to AminoAcidTransition eval module
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3783 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 17:12:03 +00:00
ebanks
6442dabf94
Deleting/archiving as instructed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3779 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-13 15:23:50 +00:00
ebanks
221e01fb27
deleting/archiving as instructed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3765 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-12 16:59:45 +00:00
aaron
3347d1ca7c
part one of combining format and info header lines code into a single abstract class for Mark; plus some 'm' removals from access methods for Eric. Adding fixes for CombineVariants next.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3719 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-05 05:57:58 +00:00
delangel
b6bdd61283
a) Fix bug when multi-base reference is homopolymeric when writing a VCF4.0 variant context: computation of number of trailing bases was incorrect and we ended up with incorrect position.
...
b) Updated VCF4WriterTestWalker to take either VCF3 or VCF4 as inputs (this walker can also be used to convert from 3.3 to 4.0).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3711 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-02 15:19:42 +00:00
hanna
4995950d04
IndexedFastaSequenceFile is now in Picard; transitioning to that implementation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3701 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 04:40:31 +00:00
hanna
c9d5345150
Redo StratifiedAlignmentContext to use ReadBackedPileup's stratification options.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3699 348d0f76-0448-11de-a6fe-93d51630548a
2010-07-01 02:46:05 +00:00
chartl
610cc7ae2b
Cool package trick Kiran showed me. VariantEvaluator no longer public, AAT specifies the core package even though it lives in oneoffs. Disabled so integration tests pass.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3677 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 22:42:04 +00:00
chartl
9ac13b8f5d
Name and body change for this module to reflect local code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3675 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 21:45:26 +00:00
aaron
844cb2ed33
fixing a bug that Eric found with RODs for reads, where some records could be omitted. Sorry Eric!
...
Also putting more tolerance into the timing on the tibble index tests (that check to make sure we're deleting out of date indexes, and not deleting perfectly good indexes). It seems that some of the farm nodes aren't great with a stopwatch.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3674 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 21:38:55 +00:00
chartl
101c27294d
Comment this guy out so we build again. (Hate it when my repository goes all funky.)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3673 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 21:16:33 +00:00
chartl
3017f82550
Initial commit of items for analyzing amino acid transitions in variant eval. Blew up my subversion by coding locally while i did not have internet. I hope this doesn't bust any integrationtests since I changed no existing code but...who knows. Crossing my fingers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3672 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 20:57:18 +00:00
delangel
3ca2b7374b
Fixes to better deal with the "Type" and "Number" field in the INFO and FORMAT header lines in VCF4.0. We now record these fields and provide appropriate conversions. This is the first version that passes fully the VCF validator.
...
Also, moved the flag indicating VCF4.0 to the VCFWriter constructor.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3669 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 16:43:00 +00:00
delangel
ed71e53dd4
1) Initial complete version of VCF4 writer. There are still issues (see below) but at least this version is fully functional. It incorporates getting rid of intermediate VCFRecord so we now operate from VariantContext objects directly to VCF 4.0 output.
...
See VCF4WriterTestWalker for usage example: it just amounts to adding
vcfWriter.add(vc,ref.getBases()) in walker.
add() method in VCFWriter is polymorphic and can also take a VCFRecord, lthough eventually this should be obsolete.
addRecord is still supported so all backward compatibility is maintained.
Resulting VCF4.0 are still not perfect, so additional changes are in progress. Specifically:
a) INFO codes of length 0 (e.g. HM, DB) are not emitted correctly (they should emit just "HM" but now they emit "HM=1").
b) Genotype values that are specified as Integer in header are ignored in type and are printed out as Doubles.
Both issues should be corrected with better header parsing.
2) Check in ability of Beagle to mask an additional percentage of genotype likelihoods (0 by default), for testing purposes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3664 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 23:54:38 +00:00
chartl
20f5fdbcf7
Changes to MVC to make the the header of its output VCF compliant with spec (give expected # of values for info field annotations)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3660 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-28 18:33:23 +00:00
aaron
682f9b46c6
Two fixes together:
...
1) Some improvements to the VCF4 parsing, including disabling validation.
2) Reimplemented RefSeq in the new Tribble-style rod system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3630 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-24 22:17:03 +00:00
chartl
75d4736600
Committing changes to comp overlap for indels. Passes all integration tests; minor changes to MVC walker.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3618 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-23 15:49:13 +00:00
aaron
a6d3e4bd47
Add code to allow reference alleles with 'N' in VariantContext, but not in the alternate allele(s). Also more updates to the VCF 4 code (fixed parsing for files without genotypes).
...
This check-in will temperarly break the build (I need to see if Bamboo is correctly returning the log file for the failed builds).
Will be fixed once Bamboo starts building.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3609 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 18:26:37 +00:00
aaron
32f324a009
incremental changes to the VCF4 codec, including allele clipping down to the minimum reference allele; adding unit testing for certain aspects of the parsing. Not ready for prime-time yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3604 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-22 06:31:05 +00:00
hanna
f18ac069e2
A refactoring / unification of ReadBackedPileup and ReadBackedExtendedEventPileup.
...
Provides a cleaner interface with extended events inheriting all of the basic RBP
functionality. Implementation is still slightly messy, but should allow users to
provide separate implementations of methods for sample split pileups and unsplit
pileups for efficiency's sake.
Methods not covered by unit/integration tests have not been sufficiently tested yet.
Unit tests will follow this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3597 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-20 04:42:26 +00:00
chartl
f44d8b150f
Mendelian Violation Classifier now filters violations on the fly via command line arguments; and closes unterminated homozygous regions at the end of a chromosome (so we see arms falling off in the file, rather than in the log)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3592 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-18 19:32:24 +00:00
ebanks
b75ded61b8
Removing obsolete rod; no longer needed given previous addition to SampleUtils.
...
JIRA GSA-318
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3572 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-16 20:03:14 +00:00
weisburd
1e42984a16
Improved buffer-size arg handling
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3553 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-14 19:59:15 +00:00
kiran
804facb0cc
Removing these utilities as part of a hostage negotation with Matt. Can I have my journal club paper now?!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3539 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-11 21:41:29 +00:00
weisburd
338bb9adf4
CommandLineProgram for measuring java I/O speeds for large plain-text or gzipped files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3532 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-10 21:34:37 +00:00
chartl
20167fd411
Final changes to MVC -- associates variants with regions of homozygosity in child and parents, corrects for genotype errors, and prints out a separate file with informationf or each region of homozygosity.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3521 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 18:05:37 +00:00
ebanks
9b2fcc4711
Refactoring of the annotation system:
...
1. VA is now a ROD walker so it no longer requires reads (needs a little more testing)
2. Annotations can now represent multiple INFO fields (i.e. sets of key/value pairs)
3. The chromosome count annotations have been pulled out of UG and the VCF writer code and into VA where they belong. Fixed the headers too.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3513 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 17:05:51 +00:00
chartl
8f9e3e8ad7
Commit for Kiran; but this is now working, barring little exceptions that I've yet to run across...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3511 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-09 14:21:19 +00:00
chartl
736098b58d
A quick commit before running home. This is a re-factored version of the OppositeHomozygoteClassifier which will work with deNovo violations as well. Some code still needs to be migrated from OHC which is wy that walker isn't yet deleted. This'll be up and running tonight.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3502 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 20:47:01 +00:00
chartl
933133ee28
Initial commit of the opposite homozygote classifier. Currently does the following, given a trio vcf:
...
+ Identifies opposite homozygote sites
+ Identifies the parent from whom it is expected that a null allele was inherited (or whether it was a putative genotype error; e.g. mom=homref, dad=homref, child=homvar)
+ Labels each opposite homozygote with its homozygous region in the child (e.g. region 1, region 2)
+ Labels each opposite homozygote with the size of the homozygous region in which it was found, the number of child homozygotes in the region, and the number of opposite homozygote violations within that region
To come:
+ Classification of sites as likely tri-allelic
Note that this is very experimental
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3498 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-08 03:56:07 +00:00
delangel
ef47a69c50
a) First fully functional (sort of) version of walker that parses Beagle imputation output files and produce a vcf with imputed genotypes.
...
More doc/info to follow shortly. Issues still to be solved:
a) Walker changes all genotypes based on Beagle data, but annotations on the original VCF are unchanged. They should in theory be recomputed based on new genotypes.
b) Current implementation is ugly, dirty unwieldy and will necessitate a refactoring soon so I can keep my pride. Most aesthetically affronting issue right now is that we read the full Beagle files at initialization and keep them in memory, but a more delicate implementation would just read from files on a marker by marker basis. Issue that currently prevents this is that BufferedReader() instances don't seem to play nice when called from the map() function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3488 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 20:37:25 +00:00
depristo
b811e61ae1
Optimized, nearly complete VCF4 reader 2-4x faster than the previous implementation, along with a VCF4 reader performance testing walker that can read 3/4 files, useful for benchmarking
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3487 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 18:11:38 +00:00
aaron
0b03e28b60
updating the tribble library to include the reference dictionary reading / writing. We now check the dictionaries of any tracks that have them against the reference (all new tribble tracks and out-of-date tracks will have this). Also renamed some classes to be more reflective of their function.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3485 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-04 06:34:26 +00:00
ebanks
ffeb3fd80d
Thanks to Guillermo, I found a bug in the Unified Genotyper output: GL was posteriors instead of likelihoods. Not a huge deal because the
...
priors were flat, but fixed nonetheless.
Also, needed to update Tribble.
Minor updates to the Beagle input maker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3461 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 19:28:26 +00:00
aaron
871cf0f4f6
Call out ROD types by there record type, instead of the codec type (which was clumsy). So instead of:
...
@Requires(value={},referenceMetaData=@RMD(name="eval",type= VCFCodec.class))
you'd say:
@Requires(value={},referenceMetaData=@RMD(name="eval",type= VCFRecord.class))
Which is more in-line with what was done before. All instances in the existing codebase should be switched over.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3457 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-28 14:52:44 +00:00
chartl
ff4a0764df
Read error rate is now parallelizable
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3447 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-27 19:00:09 +00:00
delangel
3873dccb35
First fully functional (though preliminary) version of walker that takes an input VCF and outputs a Beagle .bgl file that can be used for missing genotype calls/haplotype imputation. For now, only supported input format is likelihood format for unrelated individuals.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3444 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 21:03:23 +00:00
chartl
f9efc1248c
VariantEvalWalker now takes indels if you throw the -dels flag. IndelLengthHistogram appears to be working properly, it is turned off by default (as it is experimental) but you can turn it on in your own repository.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3443 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 20:03:14 +00:00
chartl
88a06ad81f
Changes to Depth of Coverage:
...
- For speedup in large number of samples, base counts are done on a per read group level, then
merged into counts on larger partitions (samples, libraries, etc)
+ passed all integration tests before next item
- Added additional summary item, a coverage threshold. Set by (possibly multiple) -ct flags,
the summary outputs will have columns for "%_bases_covered_to_X"; both per sample, and
per sample per interval summary files are effected (thus md5s changed for these)
NOTE:
This is the last revision that will include the per-gene summary files. Once DesignFileGenerator is sufficiently general, and has integration tests, it will be moved to core and the per-gene summary from Depth of Coverage will be retired.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3437 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 03:39:22 +00:00
hanna
a40e64e47b
A downsampling validator. Compares the generated pileup passed in from the alignment context to the reads,
...
passed in as a Tribble SAM text feature. If the generated pileup contains a valid set of reads according to
the downsampling rules, the test passes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3421 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-23 21:49:54 +00:00
delangel
355396109b
Bug fix to avoid build failure (class changed under me??)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3416 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 18:48:56 +00:00
delangel
1753d07b02
Added AnnotationByAlleleFrequencyWalker - walker takes an input vcf, a reference vcf and a list of annotations (with the -A argument). For each site present in both VCF's, it outputs the given annotations into the screen as well as allele frequency. Since HapMap vcf reference doesn't include AF in annotations, it computes it from Chromosome, Het and HomVar counts.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3415 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 18:31:34 +00:00
depristo
a10fca0d5c
Genotyper now is using bytes not chars. Passes all tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3406 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 21:02:44 +00:00
depristo
1ab00e5895
Retiring multi-sample genotyper
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3401 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 14:10:56 +00:00
depristo
727822adb4
BaseUtils has more clear distinction between byte and char routines. All char routines are @Depreciated now. Please use bytes. Better organization of reverse(), now in Utils not BaseUtils.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3400 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 14:05:13 +00:00
depristo
8a725b6c93
Restructuring of ReferenceContext and ReadWalkers to accept a ReferenceContext. Now ReferenceContext is byte[] backed not char[]. Please no more chars for the reference. All of the tests pass now. Coming check-ins are going to clean up the char / byte problems in the GATK
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3397 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 23:27:55 +00:00
aaron
02cc1afdc8
remove RodBed and all it's dependencies.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3396 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 19:12:30 +00:00
chartl
ffb1b46166
Added a GCCalculatorWalker for a oneoff analysis for Mark Daly (GC content of agilent 1.1 targets)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3395 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 18:49:51 +00:00
chartl
b7d21627ab
Changes to DepthOfCoverage (JIRA items) and added back an integration test to cover it. Alterations to the design file generator to output all transcripts (rather than choosing one at random).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3366 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-17 17:23:00 +00:00
aaron
2c55ac1374
fixes for parallel processing problems with Tribble, a small bug in the resource pool, and some more documentation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3349 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-12 06:13:26 +00:00
kiran
aec5f7b630
Can now threshold results based on minimum base and/or mapping quality.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3343 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-10 19:58:07 +00:00
kiran
13fd182b7c
For dealing with slightly malformatted BAMs - mark every alignment as primary, or in the case of some BAM files from UWash, supply the sample information for each read group.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3340 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-10 15:17:05 +00:00
kiran
98718d0faa
Computes the error rate per cycle
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3336 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-10 14:50:22 +00:00
aaron
7d2df3f511
example windowed ROD walker for Kristian, and updates to Tribble
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3325 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-07 17:12:50 +00:00
aaron
a68f3b2e9c
VCF moved over to tribble.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3302 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 17:28:48 +00:00
chartl
617542853f
Walker that can be used with refGene and a TCGA bed file to annotate intervals in an interval list with the genes and exons they overlap.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3296 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-05 02:55:01 +00:00
aaron
f497213933
DbSNP moved over to tribble
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3288 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-03 06:02:35 +00:00
ebanks
1714c322c2
Reorg of UG args; checking in first before upcoming changes that will break integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3274 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-30 14:48:46 +00:00
aaron
7fbfd34315
adding the GELI ROD validation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3270 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-29 21:43:00 +00:00
aaron
68bdac254b
a utility walker for validating changes made to the underlying ROD system in the transistion to Tribble.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3258 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-26 05:21:24 +00:00
chartl
121163dd49
interim commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3240 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-22 13:44:45 +00:00
ebanks
84ebceb9a6
Fix for Chris: need to use the appropriate conversion method. Added a warning to the adaptor.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3235 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-22 02:05:10 +00:00
chartl
e7334ec11f
Checkin for Eric (IndelDBRateWalker is a prelude to a VariantEval module for comparisons for indels)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3234 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-22 00:40:27 +00:00
chartl
84f1ccd6ac
Two dumb oneoff walkers written to fix & annotate the Baylor indel calls (which came in sans reference, and without coding/intron annotations).
...
ERIC -- does the IndelAnnotator (the RefSeq lookup code I stole from IndelGentoyperV2) want to be its own Annotation inside VariantAnnotator? Is Andrey already doing this as part of adding indel calling to UG?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3226 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-21 14:04:10 +00:00
hanna
c1e53d407d
The copyright tag that I copied/pasted from a LaTeX document into IntelliJ had
...
unicode quote characters embedded in it. These characters were invisible inside
IntelliJ but cause compile warnings for Ryan and Aaron, who for whatever reason
have a different default charset. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3203 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 15:26:32 +00:00
aaron
b5f6f54968
Almost done removing any trace of the old Variation and Genotype interfaces.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3202 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-20 14:52:15 +00:00
hanna
1bc26f69e9
An attempt to cleanup the Utils directory. Email to follow.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3198 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-19 23:00:08 +00:00
ebanks
d73c63a99a
Redoing the conversion to VariantContext: instead of walkers passing in a ref allele, they pass in the ref context and the adaptors create the allele. This is the right way of doing it.
...
Also, adding some more useful integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3194 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-19 05:47:17 +00:00
aaron
e11ca74eb5
removing some outdated ROD classes (PooledEMSNPROD and SangerSNPROD), removing an out-of-date interface (VariantBackedByBenotype), and moving AnalyzeAnnotationWalker over to VariationContext.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3188 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-16 18:59:29 +00:00
ebanks
d5e5589b8f
No longer used
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3187 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-16 17:57:39 +00:00
ebanks
e702bea99f
Moving VE2 to core; calling it "VariantEval" (one more checkin coming)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3179 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 20:25:47 +00:00
ebanks
ac9dc0b4b4
Removing VariantEval (v1); everyone should be using VE2 now. Docs coming ASAP.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3177 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 19:53:02 +00:00
ebanks
3330e254a9
Standardize the dbsnp track name in preparation for case-sensitivity
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3176 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 19:41:57 +00:00
ebanks
5f7564bf0a
Better naming of output columns
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3175 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 18:08:07 +00:00
ebanks
04909fa6ad
Removing arbitrary selects
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3169 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-14 17:46:39 +00:00
ebanks
f1189bac5a
Bug fix: final map call wasn't being triggered (because we returned when ref==null before applying update0)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3168 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-14 16:58:55 +00:00
ebanks
dde092fb61
Added the ability in VE2 to select which eval modules to run, so that you aren't forced to use all of them. You can use --list to list all of the possible modules to run.
...
Heads up everyone: by default, *no* modules are run. Please add "-all" to your scripts to maintain the previous behavior.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3161 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-13 22:15:58 +00:00
ebanks
0b575596f8
Fix for concordance: samples found only in truth no longer kill it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3160 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-13 21:33:49 +00:00
rpoplin
c2a37e4b5c
Variant Quality Score modules in VariantEval2 no longer create huge lists which hold all of the quality scores encountered and instead cast the quality score to an integer and use hash tables. Bug fix for files in which all the quality scores are set to -1.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3146 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-09 18:36:06 +00:00
chartl
7025f5b51d
Added an auxiliary table to DepthOfCoverage, which is the cumulative equivalent of the locus table (got tired of doing the calculation by hand). Also took care of a trailing tab in the per-locus output table.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3138 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-08 19:37:17 +00:00
aaron
20cc2a85a4
removed the hashmap from Genotype Concordance, moved it into a table
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3133 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 21:24:48 +00:00
aaron
e55f27b3b1
forgot a file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3132 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 20:51:13 +00:00
depristo
918b746798
More detailed validation output. Fixes for genotyping overflow -- these are temporary and need to be properly resolved
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3129 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 16:38:28 +00:00
rpoplin
7b44e6bd55
ApplyVariantClusters now outputs interesting threshold points based on hitting the target novel TiTv
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3126 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-06 19:47:29 +00:00
rpoplin
60c227d67f
Added new VE2 module to create a plot of titv ratio by variant quality score
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3125 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-06 15:19:27 +00:00
rpoplin
2d002c56c3
Added histogram of variant quality scores broken out by true positive and false positive calls to the GenotypeConcordance module of VariantEval2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3123 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-06 13:48:31 +00:00
chartl
f7d1b8f5de
CoverageStatistics has now replaced DepthOfCoverage -- old DoC is in the archive.
...
Also, I can't be bothered to fix the spelling of "oldepthofcoverage" to contain the necessary number of D's. Be content that it does, however, contain the requisite number of O's.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3109 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 16:27:23 +00:00
aaron
585cc880a2
changed jexl expressions to jexl names in the VariantEval2 output, fixed integration test, and fixed a problem where a line was getting dropped in CSV output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3108 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 16:23:14 +00:00
aaron
3d3d19a6a7
the last-mile commit for Tribble integration. The system is now ready for Tribble to be turned on, as soon as we've removed any dependencies in the ROD code on interfaces that aren't in the Tribble library (i.e. the Variation or Genotype interface on RODs). All of the walkers should be up to date.
...
a caveat: for anyone asking for all of the ROD's back from the RefMetaDataTracker (if your not using the facilities to get the track by name), you'll now be getting back a collection of GATKFeature objects. This object will contain the track name, and a method for getting the underlying object (getUnderlyingObject()), which will be the traditional RodVCF, rodDbSNP, etc. This layer is needed so we can integrate Tribble tracks (which don't natively have names). Calls that ask for RODs by name will still get back the traditional reference ordered data objects (RodVCF, rodDbSNP, etc).
Sorry for the inconvenience! More changes to come, but this is by far the largest (as has the greatest effect on end users).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3104 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-31 22:39:56 +00:00
chartl
dc802aa26f
Moved CoverageStatistics to core. This will be (soon) renamed DepthOfCoverage; so please use CoverageStatistics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3090 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 13:32:00 +00:00
aaron
074ec77dcc
First go of the new output system for VE2. There are three different report types supported right now (Table, Grep, CSV), which can be
...
specified with the reportType command line option in VE2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3083 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-27 03:59:32 +00:00
ebanks
2373a4618f
bug caused by a misprint: context != contexts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3073 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-25 03:08:24 +00:00
aaron
60dfba997b
added some sample annotations to VariantEval2 analysis modules, and some changes to the report system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3067 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-24 05:40:10 +00:00
depristo
076d21d394
Minor bug workaround in GenotypeConcordance module (see todo). General platform read filter. You can say -rl Platform illumina to remove all SLX reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3054 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-22 02:47:09 +00:00
depristo
7b17bcd0af
Refactoring a few useful routines for detecting mendelian violations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3043 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-19 13:19:01 +00:00
ebanks
b8e8852b4f
Better interface for the Annotator in how it interacts with VariantContext.
...
Also, added a proof of concept genotype-level annotation (not working yet, almost there).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3035 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 20:41:57 +00:00
ebanks
ee0e833616
Some significant changes to the annotator:
...
1. Annotations can now be "decorated" with any arbitrary interface description - not just standard or experimental.
2. Users can now not only specify specific annotations to use, but also the interface names from #1 . Any number of them can be specified, e.g. -G Standard -G Experimental -A RankSumTest.
3. These same arguments can be used with the Unified Genotyper for when it calls into the Annotator.
4. There are now two types of annotations: those that are applied to the INFO field and those that are applied to specific genotypes (the FORMAT field) in the VCF (however, I haven't implemented any of these latter annotations just yet; coming soon).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3029 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-18 05:38:32 +00:00
ebanks
e367a50e9b
Added genotype concordance module. Not at all finished, but needed to give something to Aaron to look at for help in printing the output nicely.
...
Also misc cleanup and fixes (e.g. perform evalulation even when no comp tracks are provided).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2996 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-14 19:02:24 +00:00
depristo
b39b5edca8
Bug fix in variant eval 2. Preliminary (slow and buggy) support for -XL exclude lists.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2991 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 19:23:12 +00:00
depristo
18ba9929f9
notes for eric
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2983 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 20:34:54 +00:00
depristo
4f4555c80f
PPV and Sensitivity added to validation tool output; support for arbitrary -sample arguments to subset variant contexts by sample
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2978 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 22:28:31 +00:00
depristo
486bef9318
Support for validationRate calculation in variant eval 2; better error messages for failed genome loc parsing; tolerance to odd whitespace in plinkrod, and fix for monomorphic sites in vcf2variantcontext.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2976 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 16:25:16 +00:00
chartl
0a49dffa8f
Row/Column names are now R-friendly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2966 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 19:01:03 +00:00
ebanks
9f3b99c11b
Moving UnifiedGenotyper and VariantAnnotator over to VariantContext system.
...
Removing obsolete genotyping classes.
First stage of removing dependence on old Genotype class.
More changes to come.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2960 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 03:41:07 +00:00
chartl
21bf8b4b93
Odd, what I saw on IntelliJ hadn't saved to sting before committing. Here's the actual change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2956 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 15:54:41 +00:00
chartl
cc6a714c09
Handle excess coverage in interval output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2954 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 14:40:05 +00:00
chartl
037ac9c9af
Actually calculate base counts by read group when "both" is specified. Modified integration test to cement the now-correct "both" behavior.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2941 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 18:31:48 +00:00
chartl
8738c544f1
Minor refactoring of CoverageStatistics to allow simultaneous output of per-sample and per-read group statistics.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2940 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 17:06:52 +00:00
depristo
33cefddf55
Better INFO field annotation for Mendel violations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2937 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 15:22:04 +00:00
chartl
706d49d84c
Commit for Aaron
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2932 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 21:29:07 +00:00
ebanks
0dd65461a1
Various improvements to plink, variant context, and VCF code.
...
We almost completely support indels. Not yet done with plink stuff.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2926 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 17:58:01 +00:00
chartl
6759acbdef
Coverage statistics now fully implements DepthOfCoverage functionality, including the ability to print base counts. Minor changes to BaseUtils to support 'N' and 'D' characters. PickSequenomProbes now has the option to not print the whole window as part of the probe name (e.g. you just see PROJECT_NAME|CHR_POS and not PROJECT_NAME|CHR_POS_CHR_PROBESTART-PROBEND). Full integration tests for CoverageStatistics are forthcoming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2924 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 15:00:02 +00:00
aaron
790d2a7776
adding the initial ROD for Reads support; more convenience methods in ReadMetaDataTracker to come.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2918 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 15:56:44 +00:00
chartl
cfff486338
This commit is for Kiran
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2898 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 18:18:38 +00:00
chartl
87f8fb7282
Quick commit in advance of Aaron's. Just a bunch of refactoring (private classes separated out, put in proper package). Also support added for coverage by read group rather than sample.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2897 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 16:39:47 +00:00
chartl
496ecc8186
Change in how overall coverage and means are stored in the DOCS object; change from keeping track of sample mean coverage to keeping track of sample total coverage (calculate means at the end)
...
This is a mid-way commit for Aaron
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2895 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 15:51:12 +00:00
depristo
9a6b384adb
Support for no qual fields in VCF; better support for Mendelian violation calculations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2893 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 00:29:17 +00:00
aaron
246fa28386
RODs for reads phase 2: modified RODRecordList to implement List<ReferenceOrderedDatum> so I could stub it out for testing, added a FlashBackIterator which is needed to prevent the ResourcePool from opening infinity+1 iterators, and some other interfaces to make unit testing much smoother.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2892 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 22:48:55 +00:00
chartl
591102a841
Don't close the output stream if we're printing to stdout
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2891 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 21:50:58 +00:00
chartl
10cc71ceb0
Another midway commit for teh engineerz
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2890 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 21:24:02 +00:00
chartl
3d92e5a737
Initial commit of integration test(s) for CoverageStatistics, currently in progress [midway commit is for Matt]
...
Modifications to CoverageStatistics - now includes and extends much of the behavior of DepthOfCoverage (per-base output, per-target output).
Additional functionality (coverage without deletions, base counts, by read group instead of by sample) is upcoming.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2888 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 20:25:07 +00:00
aaron
fef1154fc8
starting on RODs for Reads: made RODRecordList implement list<RODatum> (so we can sub in fake lists during testing), and removed unnecessary generic-ness. Removed BrokenRODSimulator, which isn't being used.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2884 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 22:11:53 +00:00
chartl
5df37968de
Simplification of code segments; slight alteration to per-locus tabulation; added to-do items for cosmetic changes (mostly binning options and settigns)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2882 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-24 05:20:18 +00:00
chartl
1f673e9fab
Float the bins with the given lower bound
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2878 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 20:48:53 +00:00
chartl
119d449b46
Formatting changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2877 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 20:43:15 +00:00
chartl
173956927b
Summaries generated for firehose from DoC output have been migrated to its own walker to calculate aggregate coverage statistics in a parallelizable and fast way. This is an initial commit, bug-fixing and testing is upcoming.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2876 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-23 18:41:02 +00:00
chartl
0e05a3acb0
Adding depth of coverage features to firehose summary tools
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2860 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-19 19:47:16 +00:00
aaron
b1a4e6d840
removing non-ascii characters from my Copyright and from VariantEval2Walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2856 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-18 18:54:36 +00:00
depristo
a1a3d5fcb0
Support for reading in table of rsIDs -> dbSNP builds to back generate a dbSNP build X from a single file. Very useful indeed. dbSNP -> VC now captures the rsID in the context
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2837 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 22:40:55 +00:00
chartl
04a2784bf7
Initial commit of tools under development for data QC through firehose.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2834 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 19:13:24 +00:00
depristo
5f74fffa02
Massive improvements to VE2 infrastructure. Now supports VCF writing of interesting sites; multiple comp and eval tracks. Eric will be taking it over and expanding functionality over the next few weeks until it's ready to replace VE1
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2832 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 15:26:52 +00:00
depristo
c66861746a
improvements to ve2, including more meaningful mendelian violation counting. Support for VCF emitted interesting sites, annotated according to the evaluations themselves. Basic intergration test for VE2 started
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2819 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-10 16:12:29 +00:00
depristo
934d4b93a2
VariantContext to VCF converter. BeagleROD, and phasing of VCF calls. Integration tests galore :-)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2814 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-09 19:02:25 +00:00
depristo
94f892ad42
VCF->beagle and VCF phasing using beagle input. Appears to work fairly well. VariantContexts now support phased genotypes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2812 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-09 01:22:05 +00:00
chartl
935e76daa1
Minor changes to oneoff walkers. PlinkRod altered but still commented.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2808 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-08 18:49:56 +00:00
depristo
3b1ab86d11
Added generic interfaces to RefMetaDataTracker to obtain VariantContext objects. More docs. Integration tests for VariantContexts using dbSNP and VCF. At this stage if you use dbSNP or VCF files only in your walkers, please move them over to the VariantContext, it's just nicer. If you've got RODs that implemented the old variation/genotype interfaces, and you want them to work in new walkers, please add an adaptor to VariantContextAdaptors in refdata package. It should be easy and will reduce burden in the long term when those interfaces are retired.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2803 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-06 16:26:06 +00:00
depristo
995d55da81
now uses the new RMDT getVariantContext() functions instead of doing the work itself.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2802 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-06 16:23:06 +00:00
depristo
af8c47fc2f
Fixing up testVariantContext for integration tests for variant context. Printing of VCs and genotypes now stable using sorting. Cleaned up comments in quality score by strand. RefMetaDataTracker now directly allows walkers to obtain VariantContexts using the simple Collection<VariantContext> getAllVariantContexts(GenomeLoc curLocation, EnumSet<VariantContext.Type> allowedTypes, boolean requireStartHere, boolean takeFirstOnly) function. VCF and dbSNP VariantContexts now officially supported. Other importan types can be added to the adapator system in refdata package. Integration tests later today
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2791 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 15:42:54 +00:00
depristo
c6d86da4b8
almost managed to move things around perfectly in move go
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2788 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 14:18:26 +00:00
depristo
e0af3bf761
updating back names
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2786 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:53:45 +00:00
depristo
777617b6c7
managed to actually move the files too! Damn you svn
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2785 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:47:19 +00:00
depristo
8938a4146d
moving varianteval2 to it's own dir
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2784 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:37:04 +00:00
depristo
69132c81aa
Documentation. Plus nicer structure to adaptors. Intermediate checkin before move into core
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2783 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 13:33:27 +00:00
depristo
1d86dd7fd1
Interface changes following Matt's advice. VariantContexts are now immutable, and there are special mutable versions, in case you need to change things. AttributedObject now a InferredGeneticContext and package protected. VariantContexts are now named, which makes them easier to use with the rod system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2780 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 20:55:49 +00:00