streaming/piping VCFs into the GATK. Notable changes:
- Public interface to RMDTrackBuilder is greatly simplified; users can use it only to build
RMDTracks and lookup codecs.
- RODDataSource and RMDTrack are no longer functionally at the same level; RODDataSources now
manage RMDTracks on behalf of the GATK, and the only direct consumers of the RMDTrack class
are the walkers that feel the need to access the ROD system directly. (We need to stamp out
this access pattern.
A few minor warts were introduced as part of this process, labeled with TODOs. These'll be
fixed as part of the VCF streaming project.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4915 348d0f76-0448-11de-a6fe-93d51630548a
BAQ has generally useful improvements. Refactor code to make it easier for BAQUnitTest to run. minBaseQuality enforced on output, as well as input now. Added BAQUnitTest that checks that the BAQ calculation is performing as expected. Still needs to be expanded significantly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4804 348d0f76-0448-11de-a6fe-93d51630548a
SAMFileWriterStub now supports BAQ writing as an internal feature. Several walkers have the @BAQMode applied to this, with parameters that I think are reasonable. Please look if you own these walkers, though
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4798 348d0f76-0448-11de-a6fe-93d51630548a
This commit adds two important classes: Sample, which contains data about one sample; and SampleDataSource, which manages sample data a la ReferenceDataSource and ReadsDataSource.
This code should be stable, but it has not been integrated with existing walkers yet. That's the next commit.
In the meantime, feel free to experiment with the code - there are two basic example walkers in the playground.sample package. And PLEASE let me know if you see any errors/inconsistencies.
Note that this also adds a new dependency on SnakeYaml, a YAML parser.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4285 348d0f76-0448-11de-a6fe-93d51630548a
(-B:name,type file) as well as the old syntax. Also, a bonus feature: BAMs can now be tagged at the
command-line, which should allow us to get rid of some of the hackier calls in GenomeAnalysisEngine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4105 348d0f76-0448-11de-a6fe-93d51630548a
Some refactoring of Queue extensions for reusability in scripts.
Putting the extensions into the Queue.jar after building them.
More updates to GATK walker arguments specifying @Input and @Output for Queue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4032 348d0f76-0448-11de-a6fe-93d51630548a
Changes to StingUtils command line for use in Queue, replacing Queue's use of property files.
Updates to walkers used in existing QScripts to add @Input/@Output.
RMD used in @Required/@Allows now has a new default equal to "any" type.
New QueueGATKExtensions.jar generator for auto wrapping walkers as Queue CommandLineFunctions.
Added hooks to modify the functions that perform the Scattering and Gathering (setting their jar files, other arguments, etc.)
Removed dependency on BroadCore by porting LSF job submitter to scala.
Ivy now pulls down module dependencies from maven.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3984 348d0f76-0448-11de-a6fe-93d51630548a
- changed to a better method for getting headers from Codecs
- some removal of old commented out code in the GATKAgrumentCollection
- changes for the rename of FeatureReader to FeatureSource
- removed the old Beagle ROD
- cleaned up some of the code in SampleUtils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3826 348d0f76-0448-11de-a6fe-93d51630548a
- Remove some old debugging cruft regarding handling of threaded engine exceptions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3796 348d0f76-0448-11de-a6fe-93d51630548a
impacts performance on low-pass data and, if it works well, will create a more automatic version of the tool.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3709 348d0f76-0448-11de-a6fe-93d51630548a
are gone where I could identify them, but hierarchies that split to support two sharding systems have
not yet been taken apart.
@Eric: ~4k lines.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3580 348d0f76-0448-11de-a6fe-93d51630548a
as walker attributes or from the command-line. Not ready yet! Downsampling/deduping
works in a general sense, but this approach has not been completely optimized or validated.
Use with caution.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3392 348d0f76-0448-11de-a6fe-93d51630548a
unicode quote characters embedded in it. These characters were invisible inside
IntelliJ but cause compile warnings for Ryan and Aaron, who for whatever reason
have a different default charset. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3203 348d0f76-0448-11de-a6fe-93d51630548a
sorting and merging interval lists, and creating RODs from intervals. This
gives Doug the ability to keep using our interval list parsing code when
sorting intervals on our behalf.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3159 348d0f76-0448-11de-a6fe-93d51630548a
Using this feature, sites covered by the target ROD will be iterated over. This list of intevals generated is merged with any intervals from the -L and -XL args, and the Walker is run over the resulting merged list.
WARNING: for very large ROD's this can be costly. Consider this experimental for now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3134 348d0f76-0448-11de-a6fe-93d51630548a
-Check that base and qual strings are the same lengths
-Fix one more bug in the clipper.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3006 348d0f76-0448-11de-a6fe-93d51630548a
* Moved GATKArgumentCollection into gatk.arguments folder to clean up the main folder, also added some associated argument classes (most of the changes).
* Added code the argument parsing system for default enums, we needed this so we could preserve the current unsafe flag, and at the same time allow finer grained control of unsafe operations. You can now specify:
"-U" (for all unsafe operations), "-U ALLOW_UNINDEXED_BAM" (only allow unindexed BAMs), "-U NO_READ_ORDER_VERIFICATION", etc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2586 348d0f76-0448-11de-a6fe-93d51630548a