Commit Graph

2883 Commits (f7c9f131ea1fd573775ba53acdd88882d1b3eade)

Author SHA1 Message Date
weisburd 8db7c97c4d Moved AnnotatorInputTableFeature and Codec to org.broadinstitute.sting.gatk.refdata.features.annotator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3427 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-24 14:38:54 +00:00
weisburd 4aa749c709 Moved AnnotatorInputTableFeature and Codec to org.broadinstitute.sting.gatk.refdata.features.annotator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3426 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-24 14:38:07 +00:00
weisburd aca3bcb193 Moved AnnotatorInputTableFeature and Codec to org.broadinstitute.sting.gatk.refdata.features.annotator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3425 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-24 14:37:17 +00:00
weisburd 64ed770250 Moved AnnotatorInputTableFeature and Codec to org.broadinstitute.sting.gatk.refdata.features.annotator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3424 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-24 14:36:28 +00:00
hanna ee3f2eb1d0 Don't output traversal reduce result in the logger. In many cases, the reduce
result is tangential to the product of the analysis and having the logger always
emit it can confuse the output (such as in the new reduceByInterval 
DepthOfCoverage walker).  If users want to emit it, they can choose not override
onTraversalDone, or override onTraversalDone and write results to the output
stream / logger / whatever their choice.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3422 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-23 22:41:43 +00:00
hanna a40e64e47b A downsampling validator. Compares the generated pileup passed in from the alignment context to the reads,
passed in as a Tribble SAM text feature.  If the generated pileup contains a valid set of reads according to
the downsampling rules, the test passes.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3421 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-23 21:49:54 +00:00
delangel a280a0ff0d a) Made HaplotypeScore default annotation. This changed several integration tests, whose MD5 is now updated.
b) Disabled BaseQualRankSumTest, the returned p-values differ wildly from Matlab/R-provided ones, cause TBD.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3419 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 22:25:17 +00:00
hanna b10950c691 Simple performance optimization -- cache the number of reads in the locus hanger.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3417 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 19:26:16 +00:00
delangel 355396109b Bug fix to avoid build failure (class changed under me??)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3416 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 18:48:56 +00:00
delangel 1753d07b02 Added AnnotationByAlleleFrequencyWalker - walker takes an input vcf, a reference vcf and a list of annotations (with the -A argument). For each site present in both VCF's, it outputs the given annotations into the screen as well as allele frequency. Since HapMap vcf reference doesn't include AF in annotations, it computes it from Chromosome, Het and HomVar counts.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3415 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 18:31:34 +00:00
chartl 745d7c582f added integration test for intervals with no coverage due to filtering
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3414 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 16:52:42 +00:00
chartl 7fb3f2d3eb Annotator now buffers indel calls (prevents double-output from double-calls to map)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3413 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 16:34:34 +00:00
chartl 4e834b5e35 VFW now uses a ref window and thus is compatible with indels.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3412 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 15:59:42 +00:00
chartl 88cb93cc3c Changes to Depth of Coverage (added maximum base and mapping quality flags; with new integration tests -- because they use b36, and the other test uses hg18, it's in a different class (integration test system can't change refs on the fly). Initial change to VariantAnnotator to allow it to see extended event pilups; you currently have to throw the -dels flag; and it's specified as "very experimental". Yet,all the integration tests pass.
Homopolymer Run now does the "right" thing (e.g. single bases are represented as HRun = 0 rather than HRun = 1) for indels. AlleleBalance now does something close enough to correct.

Added a convenience method to VariantContext that will return the indel length (or lengths if a site is not biallelic).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3409 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 13:02:01 +00:00
depristo 6faf101c6c Minor improvements to Callable Loci for public consumption
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3408 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 12:50:11 +00:00
hanna 388dd8d64d Fixing bugs in downsampler introduced when I added Ryan's dup eliminator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3407 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-21 02:53:12 +00:00
depristo a10fca0d5c Genotyper now is using bytes not chars. Passes all tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3406 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 21:02:44 +00:00
hanna 7389077b3b A few misc usability fixes:
- Clarify the message emitted when -XL is supplied so I don't spend another half day chasing a bug that doesn't exist.  
- Crash with a helpful message when running -nt with non-TreeReducible walkers.
- Crash with a helpful message when running -nt with reduceByInterval walkers.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3405 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 19:02:02 +00:00
aaron b543dd4ac4 more aggressive checks for the locking, and some more documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3404 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 16:16:36 +00:00
depristo 1ab00e5895 Retiring multi-sample genotyper
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3401 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 14:10:56 +00:00
depristo 727822adb4 BaseUtils has more clear distinction between byte and char routines. All char routines are @Depreciated now. Please use bytes. Better organization of reverse(), now in Utils not BaseUtils.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3400 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 14:05:13 +00:00
depristo 6ce3835622 Removing unused methods in QualityUtils; ReferenceContext now converting all bases to upper case, but can be disabled with static boolean
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3399 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 12:38:06 +00:00
depristo 5abac5c057 A few more char -> byte cleanups
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3398 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-20 00:02:06 +00:00
depristo 8a725b6c93 Restructuring of ReferenceContext and ReadWalkers to accept a ReferenceContext. Now ReferenceContext is byte[] backed not char[]. Please no more chars for the reference. All of the tests pass now. Coming check-ins are going to clean up the char / byte problems in the GATK
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3397 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 23:27:55 +00:00
aaron 02cc1afdc8 remove RodBed and all it's dependencies.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3396 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 19:12:30 +00:00
chartl ffb1b46166 Added a GCCalculatorWalker for a oneoff analysis for Mark Daly (GC content of agilent 1.1 targets)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3395 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 18:49:51 +00:00
aaron 0036df7b03 adding a convenience method for getting at the RODs that overlap a specific locaiton as GATKFeatures.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3394 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 17:40:20 +00:00
aaron ca386439be only emit a warning if the tribble index is out of date, don't remove and replace it for them. Added a test case where the log4j appender checks the logging messages for the appropriate output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3393 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 15:12:48 +00:00
hanna 017ab6b690 Experimental versions of downsampler and Ryan's deduper are now available either
as walker attributes or from the command-line.  Not ready yet!  Downsampling/deduping 
works in a general sense, but this approach has not been completely optimized or validated.
Use with caution.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3392 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 05:40:05 +00:00
weisburd 46ba88018d Updated to the new readHeader(..) api
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3391 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 04:06:34 +00:00
weisburd 984c51efd3 Updated to use Tribble-based GATKFeature instead of TabularROD
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3390 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 03:42:12 +00:00
weisburd 42ee16f256 Updated to use Tribble-based GATKFeature instead of TabularROD
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3389 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 03:41:37 +00:00
weisburd d8469e2fba Updated to use Tribble-based GATKFeature instead of TabularROD
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3388 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 03:40:47 +00:00
weisburd d65b2d32d1 Removed AnnotatorROD which has been ported to Tribble
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3387 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 03:39:34 +00:00
weisburd b82116f488 Removed AnnotatorROD which has been ported to Tribble
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3386 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 03:39:20 +00:00
weisburd 6b96f025f5 Tribble integration for indexing the AnnotatorInputTable format
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3385 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 03:37:54 +00:00
weisburd 2f3933148d Added fast split(str, delimiter) methodf
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3384 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 03:37:26 +00:00
hanna aedb9f6734 Bring SAMPileupCodec into compliance with new interface.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3383 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 01:23:29 +00:00
aaron 7cfb9ff3dc updates for Tribble 82, fixes for Ryans case where multiple processes would attempt to read/write to the same index, and a couple other Tribble-centric bug fixes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3382 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 19:34:45 +00:00
chartl 635f61c22d Clone the other guy too
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3381 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 18:56:01 +00:00
rpoplin 9e15299475 Misc cleanup in variant recalibrator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3380 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 17:37:01 +00:00
chartl eb200e4cce Hrumph. Don't just add pointers to the same objects, actually clone the underlying arrays.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3379 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 17:13:44 +00:00
chartl e016491a3d Major refactoring of Depth of Coverage to allow for more extensible partitions of data (now can do read group, sample, and library; in any combination; adding more is fairly easy). Changed the by-gene code to use clones of stats objects, rather than munging the interval DoCs. (Fix for Avinash. Who, hilariously, thinks my name is Carl.) Added sorting methods to ensure static ordering of header and body fields.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3377 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 16:58:13 +00:00
weisburd 3c022e4b0c Improved command-line-arg validation at startup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3374 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 02:46:17 +00:00
weisburd 35b4bba35e Refactored so it could be used for knownGene and CCDS as well as refGene
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3372 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 02:44:10 +00:00
weisburd bb86c0e03a Improved error message
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3371 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 02:43:13 +00:00
weisburd 68719615be For multiple matches, shifted counter to be 1-based
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3370 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-18 02:41:50 +00:00
hanna 73e2e32837 Fix typo.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3369 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-17 21:04:00 +00:00
chartl ebd0fabf86 First pass updates to annotations to work with indels. HomopolymerRun indel behavior is currently turned off by a global boolean until it's ready to go live.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3368 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-17 21:02:13 +00:00
hanna 0791beab8f Checking in downsampling iterator alongside LocusIteratorByState, and removing
the reference implementation.  Also implemented a heap size monitor that can
be used to programmatically report the current heap size.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3367 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-17 21:00:44 +00:00