hanna
9b182e3063
Prep for documenting command-line arguments: delete some arguments that don't make sense any more given
...
the state of the traversals and GATK input requirements: all_loci (replaced by walker annotation), max
OTF sorts (bam files must be sorted and indexed), threaded io (replaced by data sharding framework).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1144 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 18:23:35 +00:00
hanna
a3e0ec20c4
Kill the TraverseByLocusWindows traversal. TraverseLocusWindows will take its place.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1138 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 13:46:35 +00:00
aaron
f5cba5a6bb
Fixed genome loc to be immutable, the only way to now change it's values is through the GenomeLocParser.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1132 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 19:17:24 +00:00
ebanks
95e2ae0171
Deal with reads whose ends are aligned off the end of a chromosome.
...
Includes update to ignore non-ATCG bases (not just 'N')
(Also, create a BWA dir for future work)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1117 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:50:05 +00:00
hanna
491ed70b44
TraverseByLocusWindow -- asstd bug fixes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1109 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:51:38 +00:00
ebanks
83816fb801
Stop using the annoying refIterator (temp change until new traversal is green lighted)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1103 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 20:05:39 +00:00
ebanks
0d9041380d
remove printouts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1100 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:54:14 +00:00
hanna
102b38c055
Sketch of new version of TraverseByLocusWindow, and a flag to conditionally turn it on.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1097 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:20:56 +00:00
ebanks
347608cfe0
remove hacked traversal in preparation for move to Matt's new one
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1091 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 14:32:05 +00:00
aaron
8b4d0412ca
Changed the duplicate traversal over to the new style of traversal and plumbed into the genome analysis engine. Also added a CountDuplicates walker, to validate the engine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1072 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 21:11:18 +00:00
aaron
bcb64d92e9
Aaron: 1, GenomeLoc: 0. I changed our GenomeLoc class, seperating the creation of a genome loc (with the reference setup) to a parser class. GenomeLoc now just represents the actual genomic postion. The constructors are now package-protected (to enforce using the parser), but we may want to expose some constructors in the future.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1069 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 14:39:41 +00:00
depristo
26eb362f52
Added novel / known split to variant eval. That is, emits all of the standard analyses on SNP partitioned into those known in the provided known db and those novel. Also fixed problem with counting bases within subsets
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1068 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 21:27:40 +00:00
depristo
d3f0c51944
longer update times so we don't overwhelm when running genome-wide
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1067 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 14:10:02 +00:00
hanna
dde52e33eb
Cleanup of the cleaned read injector based on Eric's feedback.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1062 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 22:04:47 +00:00
depristo
3c40db260d
Added REFERENCE_BASES required annotation for performance
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1047 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 21:03:57 +00:00
hanna
43a28750e0
Package level documentation -- helps new users get acclimated to the codebase more quickly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1029 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 16:27:48 +00:00
aaron
7db4497013
fixing the readTraversal output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1019 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 19:44:38 +00:00
aaron
63b5c12cbd
Changed dataSources to datasources, to be consistant with the rest of our package names. Also, this makes me champion in the largest check-in contest.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@985 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 18:13:22 +00:00
aaron
a62bc6b05d
fixed some documentation and attached a correct license
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@953 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 14:44:27 +00:00
asivache
9f35a5aa32
Insidious bug: clipped sequences (S cigar elements) where a) processed incorrectly; b) sometimes caused IntervalCleaner to crash, if such sequence occured at the boundary of the interval. The following inconsistency occurs: LocusWindow traversal instantiates interval reference stretch up to rightmost read.getAlignmentEnd(), but this does not include clipped bases; then IntervalCleaner takes all read bases (as a string) and does not check if some of them were clipped. Inside the interval this would cause counting mismatches on clipped bases, at the boundary of the interval the clipped bases would stick outside the passed reference stretch and index-out-of-bound exception would be thrown. THIS IS A PARTIAL, TEMPORARY FIX of the problem: mismatchQualitySum() is fixed, in that it does not count mismatches on clipped bases anymore; however, we do not attempt yet to realign only meaningful, unclipped part of the read; instead all reads that have clipped bases are assigned to the original reference and we do not attempt to realign them at all (we'd need to be careful to preserve the cigar if we wanted to do this)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@933 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 05:20:29 +00:00
depristo
819862e04e
major restructuring of generalized variant analysis framework. Now trivally easy to add additional analyses. Easy partitioning of all analyses by features, such as singleton status. Now has transition/transversional bias, counting, dbSNP coverage, HWE violation, selecting of variants by presence/absense in dbs. Also restructured the ROD system to make it easier to add tracks. Also, added the interval track -- if you provide an interval list, then the system autoatmically makese this available to you as a bound rod -- you can always find out where you are in the interval at every site. Python scripts improved to handle more merging, etc, into population snps.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@918 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 23:34:37 +00:00
ebanks
45eeefbb80
Deal with randomly occurring unmapped reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@906 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 02:55:53 +00:00
aaron
109bef6c08
We're no longer in the read-dropping business.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@901 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 22:37:51 +00:00
hanna
b7defeae83
Fix bug in unit tests created by new filter in TraversalEngine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@868 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 15:50:44 +00:00
depristo
e0803eabd9
enabled underlying filtering of zero mapping quality reads, vastly improves system performance
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@853 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 14:51:08 +00:00
hanna
5e8c08ee63
Update to latest version of picard. Change imports in all classes dependent on picard public from import edu.mit.broad.picard... to import net.sf.picard...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@849 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 20:13:01 +00:00
aaron
5c6163ecbf
Removing the old reads traversal.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@842 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:36:11 +00:00
aaron
3c3cd5bb64
Moving some of the data sharding around. A new shard catagory now exits, INTERVAL. This saved a lot of code that was mirroring the same approach in both the read and locus shard strategies.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@840 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:24:31 +00:00
hanna
2a5be1debe
Cleanup in datasources.providers namespace. Make it easier for others writing traversal engines to use.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@803 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 19:12:00 +00:00
depristo
30c63daf89
More improvements to the duplicate quality combiner, making progress towards a clean system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@788 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:26:57 +00:00
hanna
d61a5261c1
Better integration of reference-ordered data into the data sharding system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@779 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:09:32 +00:00
hanna
26dd3cd50e
Cleanup. Move filtering functions closer to where they're used.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@767 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 21:42:48 +00:00
hanna
96e73e496a
Delete deprecated old-school traversals.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@758 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 14:57:17 +00:00
hanna
c10741e9f5
Rename TraverseLociByReference to TraverseLoci to represent its new function.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@743 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 01:31:57 +00:00
hanna
2c4de7b5c5
Switch TraverseByLoci over to new sharding system, and cleanup some code in passing read files along
...
the pathway from command line to traversal engine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@727 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 21:02:12 +00:00
aaron
d8c1b010f1
Fixing the naming of the function I checked in earlier.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@713 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 19:27:10 +00:00
ebanks
b62bddee42
The header was never being set.
...
Added this hack for now and will alert the authorities ASAP...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@708 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 17:18:51 +00:00
aaron
7aa90757ac
Moved the iterators over to the StingSAMIterator interface. This will help us ensure that iterators that need to be closed get closed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@702 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:52:18 +00:00
hanna
d14cab0be7
Added IterableLocusContextQueue and test. Cleaned up tests, adding BaseTest where it didn't exist. Enhanced test runner to run only classes ending in ...Test.java, so that utility classes can sit alongside the tests but won't be run by JUnit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@693 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 21:32:05 +00:00
hanna
12ae3a22b6
Break locus context data access providers into modular components in preparation for traverse by loci.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@689 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 18:51:16 +00:00
aaron
6e69193e3c
Deprecated calls to getSamReader on both the GenomeAnalysisEngine and the TraversalEngine. This call fails in the new style traversals, but it won't disapear until the cut-over to the new traversals is complete.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@671 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 18:52:42 +00:00
ebanks
630066cc0a
1. Merge LocusWindows whose reads overlap.
...
2. Fix bug (we weren't clearing the "to emit" list)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@670 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 17:33:23 +00:00
hanna
ee9077fc69
LocusIterator iterated through LocusContexts, which was fine until now when we need something
...
that iterates through loci (GenomeLocs). Rename LocusIterator to LocusContextIterator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@662 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 13:54:57 +00:00
hanna
608948210c
Check for a reference before extraction.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@661 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 13:29:44 +00:00
hanna
32696b13f5
Fixed method override issue with old-style traversals.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@660 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 01:22:18 +00:00
hanna
862b8a6787
intervals_file + genome_loc => intervals.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@659 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 01:04:18 +00:00
hanna
23e9e29964
Changed reads traversals from providing a LocusContext from which the reference sequence
...
could be extracted to a char[] containing the reference bases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@657 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:45:11 +00:00
aaron
e8b8ab5985
Added code to extend Matt's getReferenceBases out to the read walkers, so they can see the corresponding reference for each read.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@652 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 03:42:38 +00:00
hanna
6e394490cb
Cleanup in preparation for ByLoci traversal. Also did some work minimizing unit tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@643 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 21:27:54 +00:00
ebanks
3aabc144c6
Added functionality to allow for a contract between LocusWindowTraversalEngine and LocusWindowWalker which allows the Walker to act upon reads outside of the provided intervals.
...
(Really, all we want to do is spit out all reads, but this allows the Walker to do other things with the reads if it wants)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@641 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 17:28:16 +00:00