hanna
f18ac069e2
A refactoring / unification of ReadBackedPileup and ReadBackedExtendedEventPileup.
...
Provides a cleaner interface with extended events inheriting all of the basic RBP
functionality. Implementation is still slightly messy, but should allow users to
provide separate implementations of methods for sample split pileups and unsplit
pileups for efficiency's sake.
Methods not covered by unit/integration tests have not been sufficiently tested yet.
Unit tests will follow this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3597 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-20 04:42:26 +00:00
hanna
c3b68cc58d
Rethinking DownsamplingLocusIteratorByState with a flattened read structure. Samples are kept
...
independent while processing, and only merged back in a priority queue if necessary in a special
variant of the ReadBackedPileup. This code is not live yet except in the case of naive deduping.
Downsampling by sample temporarily disabled, and the ReadBackedPileup variant is sketchy and
not well integrated with StratifiedAlignmentContext or the walkers. Cleanup to follow.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3540 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-13 01:47:02 +00:00
depristo
dfc36c1e95
Restructuring of the mandatory read filters for traversals. Now everything uses ReadFilters, even for the required filters like being mapped for LocusWalkers. Statistics now tracked for each read filter used during the traversal and info emitted in INFO at the end.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3445 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-26 22:12:25 +00:00
hanna
ee3f2eb1d0
Don't output traversal reduce result in the logger. In many cases, the reduce
...
result is tangential to the product of the analysis and having the logger always
emit it can confuse the output (such as in the new reduceByInterval
DepthOfCoverage walker). If users want to emit it, they can choose not override
onTraversalDone, or override onTraversalDone and write results to the output
stream / logger / whatever their choice.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3422 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-23 22:41:43 +00:00
depristo
8a725b6c93
Restructuring of ReferenceContext and ReadWalkers to accept a ReferenceContext. Now ReferenceContext is byte[] backed not char[]. Please no more chars for the reference. All of the tests pass now. Coming check-ins are going to clean up the char / byte problems in the GATK
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3397 348d0f76-0448-11de-a6fe-93d51630548a
2010-05-19 23:27:55 +00:00
ebanks
1e8b3ca6ba
Fare thee well, oh LocusWindowTraversal.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3089 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 13:17:26 +00:00
hanna
b4b4e8d672
For Sarah Calvo: initial implementation of read pair traversal, for BAM files
...
sorted by read name.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3052 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-21 23:22:25 +00:00
hanna
a7ba88e649
Rework the way the MicroScheduler handles locus shards to handle intervals that span shards
...
with less memory consumption.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2981 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-11 18:40:31 +00:00
hanna
02f48b6457
Fix bug that's been in the GATK for a very long time: update nReads (as well
...
as nRecords), so that INFO logging doesn't say 'skipped 0 of 0 reads'. While
I'm in there, update TraversalStatistics to store longs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2959 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 22:44:54 +00:00
aaron
790d2a7776
adding the initial ROD for Reads support; more convenience methods in ReadMetaDataTracker to come.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2918 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-03 15:56:44 +00:00
hanna
199b43fcf2
Reduce by interval alterations to interface with new sharding system. This checkin with be followed by a
...
simplification of some of the locus traversal code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2886 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 00:16:50 +00:00
depristo
3d45457595
VariantEval2 test framework implemented; Kiran is experimenting with the system. Not for use by anyone else. VariantContext appears to work well; I'll release it next week for general use following docs of the functions. Removing newvarianteval and other classes to avoid any future confusion. Update to TraverseLoci and RodLocusView to simplify a few functions and to correct some minor errors. All tests pass without modification.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2748 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-30 20:51:24 +00:00
hanna
3d922a019f
Basic support for very simple index-driven locus traversals. Interface has been changed to
...
support batched intervals in a single shard, but intervals are not yet compressed into a single
shard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2730 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-29 03:14:26 +00:00
rpoplin
79c4cc1db7
AnalyzeAnnotations now breaks out titv by calls in hapmap and also plots true positive rates. Any RODs passed in whose name starts with 'truth' is considered to be the truth set.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2726 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 21:41:23 +00:00
asivache
c8c5c176cd
-erw argument, begone! Rod traversals are now enabled. current tests pass, more tests for RODWalkers are welcome ;)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2600 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 21:07:49 +00:00
hanna
b19bb19f3d
First successful test of new sharding system prototype. Can traverse over reads from a single
...
BAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2587 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-15 03:35:55 +00:00
asivache
d01bde36a4
Make sure that reference view holds enough bases to pass full-length deleted sequence to the walker's map() function in extended event mode (this addresses the problem of a deletion crossing the shard's boundary, so that an attempt to extract deleted bases results in a crash)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2543 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 22:37:22 +00:00
asivache
7f91b4d824
Bug fix. It would be nice if we could extract ROD annotations for the whole length of an extended event (indel), and we tried... But alas, it does not work with the current ROD system (after extracting length on ref > 1 ROD data for a deletion, rod iterator crashes on the attempt to re-load annotations for next reference base)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2541 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-07 21:30:55 +00:00
depristo
8d13597a27
Temporary command-line support to enable rod walkers, if you know what you are doing this is safe.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2505 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-06 12:15:36 +00:00
asivache
a18e31f5b8
If alignment context at the locus holds extended event, get rod metadata and (importantly) reference bases for the whole span of the event (if it is a deletion that is, insertions still have length 0 on the ref!)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2489 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:56:25 +00:00
hanna
05deb8796b
Simplify handling of reference sequence for unmapped reads. Improvement made based on a suggestion from Alec.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2475 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 21:06:20 +00:00
depristo
87e863b48d
Removed used routines in duputils; duplicatequals to archive; docs for new duplicate traversal code; general code cleanup; bug fixes for combineduplicates; integration tests for combine duplicates walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2468 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 19:46:29 +00:00
depristo
fcc80e8632
Completely rewritten duplicate traversal, more free of bugs, with integration tests for count duplicates walker validated on a TCGA hybrid capture lane.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2458 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 23:56:49 +00:00
hanna
e6127cd6c5
Temporary hack for Tim Fennell: introduce a sharding strategy that stuffs all data into a single
...
shard for cases when the index file isn't available. Works for the case in question, but is not
guaranteed to work in general. Will be replaced once the new sharding system comes online.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2383 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:55:42 +00:00
aaron
86dc98bfb5
update the documentation for CombineDuplicates for the new help system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2324 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 17:01:42 +00:00
aaron
f64a4c66ac
some tweaks for the GATK paper genotyper to better work with shared memory parallelization, added documentation changes for Matt's new help system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2319 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 22:33:51 +00:00
hanna
4c147329a9
Turn javadoc comments for packages and classes into key/value pairs in a properties file. Embed the properties file
...
in GenomeAnalysisTK.jar. Still no support for actually displaying the archived javadoc. Also change the approach
to providing package javadocs: retired the deprecated package.html file in favor of Java1.5-style package-info.java.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2263 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 20:08:41 +00:00
depristo
db40e28e54
ReadBackedPileup in all its glory. Documented, aligned with the output of LocusIteratorByState, and caching common outputs for performance
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2165 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 20:54:44 +00:00
aaron
c3c001e02e
cleanup of the traversal output code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2026 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 06:18:10 +00:00
aaron
f9819d5f13
a little clean-up
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2007 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 06:18:34 +00:00
aaron
2ed423ed56
print the current location in read walkers (in addition to the number of reads processed), along with some refactoring to support the change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2006 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 05:57:01 +00:00
depristo
86573177d1
Reverting rod walkers to use underlying refwalker implementation while we work on ROD2 and reenable the system. Added some serious sparse file parsing to variant eval tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1929 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 01:04:37 +00:00
aaron
3fb3773098
a fix for traverse dupplicates bug: GSA-202. Also removed some debugging output from FastaAltRef walker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1912 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-26 20:18:55 +00:00
depristo
caa3187af8
Enabling correct high-performance ROD walker and moved VariantEval over to it. Performance improvements in variantEval in general. See wiki for full description
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1890 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 23:31:13 +00:00
aaron
d21b582b18
memory leak, where the Resource Pool was releasing based on the value and not the key, resulting in the resourceAssignments map growing with each additional shard
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1880 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-20 00:39:42 +00:00
depristo
6e13a36059
Framework for ROD walkers -- totally experiment and not working right now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1600 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:13:15 +00:00
hanna
48713e154c
Windowed access to the reference.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1383 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 16:29:15 +00:00
hanna
21d1eba502
Cleaned division of responsibilities between arguments to map function. Reference has been changed
...
from an array of bases to an object (ReferenceContext), and LocusContext has been renamed to reflect
the fact that it contains contextual information only about the alignments, not the locus in general.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1376 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-04 21:01:37 +00:00
hanna
5429b4d4a8
A bit of reorganization to help with more flexible output streams. Pushed construction of data
...
sources and post-construction validation back into the GATKEngine, leaving the MicroScheduler
to just microschedule.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1336 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 23:00:15 +00:00
asivache
a361e7b342
SAMDataSource is now exposed by GATK engine; SamFileHeaderMerger is exposed from Resources all the way up to SAMDataSource, so now we can see underlying individual readers should we need them; GATK engine has new methods getSamplesByReaders(), getLibrariesByReaders(), and getMergedReadGroupsByReaders(): each of these methods returns a list of sets, with each element (set) holding, respectively, samples, libraries, or (merged) read groups coming from an individual input bam file (so now when using multiple -I options we can still find out which of the input bams each read comes from)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1315 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 22:59:49 +00:00
aaron
9cfd89c54f
a small refactoring, and some documentation cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1210 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 22:03:45 +00:00
aaron
d86717db93
Refactoring of the traversal engine base class, I removed a lot of old code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1209 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 21:57:00 +00:00
hanna
433ad1f060
Cleanup...deprecate FastaSequenceFile2 in favor of IndexedFastaSequenceFile or ReferenceSequenceFile from Picard, depending on the application.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1196 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 18:49:08 +00:00
hanna
5735c87581
Basic infrastructure for filtering malformed reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1178 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 22:50:22 +00:00
ebanks
e5e249d4ac
temporary fix to deal with screwy SOLiD reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1168 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-05 03:25:57 +00:00
hanna
9b182e3063
Prep for documenting command-line arguments: delete some arguments that don't make sense any more given
...
the state of the traversals and GATK input requirements: all_loci (replaced by walker annotation), max
OTF sorts (bam files must be sorted and indexed), threaded io (replaced by data sharding framework).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1144 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 18:23:35 +00:00
hanna
a3e0ec20c4
Kill the TraverseByLocusWindows traversal. TraverseLocusWindows will take its place.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1138 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 13:46:35 +00:00
aaron
f5cba5a6bb
Fixed genome loc to be immutable, the only way to now change it's values is through the GenomeLocParser.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1132 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 19:17:24 +00:00
ebanks
95e2ae0171
Deal with reads whose ends are aligned off the end of a chromosome.
...
Includes update to ignore non-ATCG bases (not just 'N')
(Also, create a BWA dir for future work)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1117 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:50:05 +00:00
hanna
491ed70b44
TraverseByLocusWindow -- asstd bug fixes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1109 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:51:38 +00:00