asivache
1bd4c0077c
Now that ROD system supports overlapping RODs, we do not need rodRefSeq to be too smart and read in all the overlapping records (transcripts) on its own; leave it to the generic ROD mechanism.
...
PARTIAL commit; new, simpler rodRefSeq will reappear in a seq.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1694 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:11:16 +00:00
aaron
11c32b588f
fixing VariantEvalWalkerIntegrationTest md5 sums, a couple comment changes, and a little bit of cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1690 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 20:54:47 +00:00
ebanks
0748d80baa
Added a convenience method in rodDbSNP to deal with Andrey's changes to the rod. Now you can just ask for the first real SNP rod from the list and not have to think about how it works.
...
CountCovariates uses it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1688 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 20:15:40 +00:00
ebanks
682b765536
bug: need to upper case chars so that == works throughout
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1684 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 18:20:43 +00:00
asivache
57d31b8e9b
Filter that discards reads from specific lanes; and also its friend that helps blacklisting a set of lanes from GATK command line a one-liner.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1681 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 16:46:06 +00:00
ebanks
5ce42cbab3
After thinking about this a bit more, it makes sense to pull this functionality out of my walker and into the GenomeLocParser where everyone else can benefit from it...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1677 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 01:32:35 +00:00
ebanks
b1dc6d65e4
interval merging is now blazingly fast
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1674 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 21:15:04 +00:00
asivache
15135788ca
OK, let's bite the bullet. Now rodDbSNP objects are 'isSNP()' only when they are annotated as 'exact', not a 'range'.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1673 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 19:25:16 +00:00
asivache
8ad181f46f
Note to myself: do 'ant clean' now and then or old versions of the code that suddenly became invalid will stick around. The world is not perfect, and neither is automatic dependency resolution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1672 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 17:40:52 +00:00
asivache
d2d1354199
Now uses BrokenRODSimulator class to pass the test. CHANGE the code to use new ROD system directly and MODIFY MD5 in corresponding tests, since a few snps are seen differently now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1670 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 17:03:49 +00:00
asivache
29adc0ca1c
Little class that can be used to simulate the results returned by the old ROD system. This is needed to keep couple of tests from breaking. All the code that uses this class must be changed urgently to accomodate the data as returned by new ROD system, and the corresponding tests (MD5 sums) have to be modified as well since some data as seen through the new ROD system is indeed different.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1668 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 16:58:56 +00:00
asivache
a6bd509593
Changing the carpet under your feet!! New incremental update to th eROD system has arrived.
...
all the updated classes now make use of new SeekableRodIterator instead of RODIterator. RODIterator class deleted. This batch makes only trivial updates to tests dictated by the change in the ROD system interface. Few less trivial updates to follow. This is a partial commit; a few walkers also still need to be updated, hold on...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1667 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 16:55:22 +00:00
asivache
4c67a49ccb
Removed unused imports
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1666 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 16:45:22 +00:00
hanna
e7f44ada98
Make unpackList public static so that Doug can use it in the scatter/gather framework.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1665 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 15:32:49 +00:00
ebanks
7b627fd622
Check for empty interval lists to merge
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1664 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 04:34:26 +00:00
hanna
7f5778c966
Update gsadevelopers -> gsahelp.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1663 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-20 23:36:54 +00:00
aaron
3a487dd64e
little fixes; also fixed a tyPo
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1662 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 22:38:51 +00:00
aaron
b6d7d6acc6
fix for the eval tests, and a change to the backedbygenotypes interface, more changes to come
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1661 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 22:25:16 +00:00
depristo
4318f75910
tiny cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1660 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 21:04:25 +00:00
depristo
3a341b2f06
Fixes for VariantEval for genotyping mode
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1659 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 21:01:43 +00:00
aaron
7b39aa4966
Adding the VCF ROD. Also changed the VCF objects to much more user friendly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1658 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 20:19:34 +00:00
ebanks
b19fd4d45c
Damn unit tests have a null Toolkit()...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1654 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 17:10:49 +00:00
ebanks
90626c843d
oops - we don't need reference bases, but we still need reference
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1653 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 16:24:45 +00:00
ebanks
2b2df4e1ba
- Fix the CleanedReadInjector to deal with -L intervals correctly.
...
- Some walkers don't use the ref base, so speed up traversals by not requiring it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1652 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 16:17:58 +00:00
asivache
94618044e8
Starting an update of ROD system. These basic classes will completely replace old ones, but with this update they are not linked to anything, so this checkpoint should be safe.
...
The main reason for the change is that there can be (and are!) multiple RODs overlapping with a single reference base position in a single track. There can be two "trivial" RODs at the same location (e.g. samtools pileup will have two point-like records at putative indel sites: one for the reference, the other one for the indel itself). Or there can be one or more "extended" RODs (length >1), eg. dbSNP can report an indel at Z:510-525 AND a SNP at Z:515.
The ReferenceOrderedDatum object (and children) will not be changed, but it is now explicitly interpreted as a single data *record*, possibly out of many available from a given track for the current site. As long as single data record occupies one line in a data file, the new ROD system will take care of loading and keeping multiple records, including extended (length > 1) ones, and will automatically drop the records when they finally go out of scope. For one-line-per-record, multiple-records-per-site RODs, there is no need anymore for the hack used so far that involved passing ROD's own implementation of iterator through reflection mechanism (though it will still work)
* RODRecordList:
the ROD system (its iterators) will now always return a LIST of all RODs available at current position or at current query interval (see below). This class is a trivial wrapper for a list of ROD objects, with added location argument for the whole collection. The location of the RODRecordList is where the ROD system is currently sitting at: a single, current base on the reference (if next() traversal is performed), or the location of the query interval when returned by seekForward() (see below). The ROD objects themselves will have their locations set according to the original data in the file. Hence, perusing the above example of a dbSNP indel at Z:510-525 and SNP at Z:515, when moving to the position Z:515 the ROD system will return a RODRecorList with location Z:515, and with two ROD objects packaged inside, one with location Z:510-525, the other with Z:515.
*RODRecodIterator:
Almost identical to old SimpleRODIterator used by ReferenceOrderedData; this is a low-level iterator that walks over records in the data file (with a callback to ROD's ::parseLine() to parse real data)
*SeekableRODIterator:
a decorator class that wraps around Iterator<ROD> (such as RODRecordIterator) and makes the data traversable by reference position, rather than record by record. This is reimplementation of the old RODIterator. SeekableRODIterator's ::next() moves to the next position on the ref and returns all RODs overlapping with that position (as a RODRecordList). This iterator also adds a seekForward(loc) operation, that allows fast forwarding to a specified position or interval. Length > 1 query arguments (extended intervals) are fully supported by seekForward(), the returned RODRecordList wil contain all RODs overlapping with the specified interval, and the location of the returned RODRecordList object will be set to that query interval. NOTE: it is ILLEGAL to perform next() after a seekForward() query with length > 1 interval. seekForward() with point-like (length=1) interval reenables next().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1650 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 15:58:37 +00:00
hanna
355136928e
Play nice with other jobs in this VM -- don't close stdout / stderr.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1646 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-17 18:55:08 +00:00
ebanks
5d85bd9671
By default, VF should ask for deleted bases so that they show up in coverage.
...
The Strand filter then needs to ignore those bases when determining bias.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1636 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 16:46:09 +00:00
hanna
01a9b1c63b
Fix for problem where err stream remapped to output stream in certain cases, (hopefully) completing Matt's hat trick of fail. Thanks, unit tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1634 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 08:33:56 +00:00
hanna
9f7cf73411
Output stream management fixes. I completely screwed up the output stream management system, but cleverly masked this fact by breaking some other stream management functionality that masked the problem.
...
Sigh.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1630 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 21:06:45 +00:00
hanna
17758b381c
Properly initialize redirected output streams in case of out and err.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1629 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 19:47:43 +00:00
andrewk
00dfe014b7
Added option to FastaReferenceWalker to change output FASTA file format's line width and to remove header lines; allows dumping raw sequence using intervals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1628 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 18:00:30 +00:00
hanna
b69eb208a6
Always create output files, even if no output was written to them.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1627 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 17:58:14 +00:00
aaron
b401929e41
incremental clean-up and changes for VariantEval, moved DiploidGenotype to a better home, and fixed a spelling error.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1624 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 04:48:42 +00:00
ebanks
01e7b39c8d
1. Don't print out values in filter field of the VCF.
...
2. Fix ratio printouts (for params file)
3. Rename ratio filter's get counts method to avoid confusion; more changes on the way this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1616 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 21:03:39 +00:00
ebanks
436f543b3b
I owe Doug a beer for finding this:
...
don't print out intervals to be merged if they're not within the global -L intervals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1615 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 20:22:30 +00:00
aaron
e03fccb223
Changes to switch Variant Eval over to the new Variation system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1611 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 05:34:33 +00:00
aaron
5b41ef5f70
rod DBSNP had a bug where the reference wasn't calculated correctly under certain conditions. Fixed getRefBasesFWD and getRefSnpFWD so that they were more in line with getAltBasesFWD and getAltSnpFWD. Also updated Variant Eval tests to reflect this change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1609 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-13 23:48:58 +00:00
ebanks
c669e8d5ad
Use constant seed in the random generator so we can be stable (and thus unit tests will work)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1607 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-13 17:40:56 +00:00
depristo
6c7a300664
Missing file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1601 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:17:09 +00:00
depristo
6e13a36059
Framework for ROD walkers -- totally experiment and not working right now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1600 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:13:15 +00:00
depristo
e8d544869d
Alignment context now supports the idea of skipped bases -- not currently in use
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1598 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:11:38 +00:00
depristo
3949b4ac72
commented out version of next() and hasNext() that appear to be correct but are causing testing problems
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1596 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:09:21 +00:00
depristo
58105636c8
getBoundRods() convenience method
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1595 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:07:57 +00:00
depristo
4e1eded389
Fixed bad compareTo operator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1594 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:07:10 +00:00
depristo
7c8b17b456
fix for SSG with pl name
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1591 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 20:39:34 +00:00
chartl
d6a0b65ac9
Changes:
...
Rollback of Variant-related changes of r1585, additional PGC code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1586 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 16:23:01 +00:00
chartl
0c54aba92a
Changes:
...
@VariantEvalWalker - added a command line option to input a file path to a pooled call file for pooled genotype concordance checking. This string is to be passed to the PooledGenotypeConcordance object.
@AllelicVariant - added a method isPooled() to distinguish pooled AllelicVariants from unpooled ones.
@ all the rest - implemented isPooled(); for everything other than PooledEMSNProd it simply returns false, for PooledEMSNProd it returns true.
Added:
@PooledGenotypeConcordance - takes in a filepath to a pool file with the names of hapmap individuals for concordance checking with pooled calls
and does said concordance checking over all pools. Commented out as all the methods are as yet unwritten.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1585 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 15:01:50 +00:00
aaron
5a64a80ab5
changes to the variation class, updates to SSG, updated tests based on changes to the SSGenotypeCall, and added the ability to run a single integration test from using the build script.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1577 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 04:31:33 +00:00
depristo
c988205884
Notes for Aaron in SSG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1576 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 03:18:51 +00:00
ebanks
1362a56227
Added fasta tests and small fix to cleaner test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1575 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 03:13:11 +00:00
depristo
0093482c62
N reference base fix for SSG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1572 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 21:19:36 +00:00
ebanks
cb31d5a0ab
VariantFiltration now outputs VCF. Important changes:
...
1. VariantsToVCF can now be called statically to output VCF for a single ROD instance; this is temporary until we have a VCF ROD.
2. VariantFiltration now outputs only 2 files, both mandatory: all variants that pass filters in geli text, and all variants in VCF.
If there are any problems, go find Aaron.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1569 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 20:04:32 +00:00
asivache
dd0085c428
1) now is tolerant to sloppy cigar strings with 0-length elements (at the price of extra recursive call)
...
2) when reads with deletions are requested, adds to the pile just those: reads with 'D' over the current reference base, but not 'N'
3) next() now implements a loop: recursive forward iteration calls to next() until ref. position with non-zero coverage is encountered were OK for (short) deletions, but with long stretches of N's they end up with stack overflow
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1568 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 20:04:04 +00:00
ebanks
542af6402e
output correct format for Sequenom SNPs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1567 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 19:21:53 +00:00
kiran
3b1e966b4c
Lowercases the sequencing platform so that a difference in case doesn't lead to the failure to look up an entry in the hash.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1565 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 17:35:45 +00:00
kiran
d82d6c0665
Excludes variants that fall below a certain LOD that changes as a function of depth.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1564 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 17:34:16 +00:00
kiran
06eae52292
Throws an exception if you attempt to use a filter that doesn't exist.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1563 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 17:33:27 +00:00
asivache
1060b36288
Bug fix: 'N' cigar elements now treated properly; for all practical intents and purposes, N is the same as D and should be treated as such, the difference is only in logical interpretation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1562 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 17:08:35 +00:00
chartl
9d69bd2c84
Modifications:
...
@CoverageAndPowerWalker - removed a hanging colon that was being printed after the reference position
@VariantEvalWalker - added a command line argument for pool size for eventual use in doing pooled caller evaluations. As now, the variable is unused.
@AlignmentContext - altered the scope of class variables from private to protected in order that child objects might have access to them
New Additions:
Filtered Contexts
Sometimes we want to filter or partition reads by some aspect (quality score, read direction, current base, whatever) and use only those reads as
part of the alignment context. Prior to this I've been doing the split externally and creating a new AlignmentContext object. This new approach makes
it a bit easier, as each of these objects are children of AlignmentContext, and can be instantiated from a "raw" AlignmentContext.
@FilteredAlignmentContext is an abstract class that defines the behavior. The abstract method 'filter' is called on the input AlignmentContext, filtering
those reads and offsets by whatever you can think of. The filtered reads/offsets are then maintained in the reads and offsets fields. These classes can
be passed around as AlignmentContexts themselves. Writing a new kind of read-filtered alignment context boils down to implementing the filter method.
@ReverseReadsContext - a FilteredAlignmentContext that takes only reads in the reverse direction
@ForwardReadsContext - a FilteredAlignmentContext that takes only reads in the forward direction
@QualityScoreThresholdContext - a FilteredAlignmentContext that takes only reads above a given quality score threshold (defaults to 22 if none provided).
A unit test bamfile and associated unit tests for these are in the works.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1559 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:49:52 +00:00
depristo
d9588e6083
bug fixes to LIBS and LIBH following ultra-aggressive regression testing across 454, solid, and solexa
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1558 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:36:12 +00:00
asivache
df11618092
Set default value of useLocusIteratorByHanger to FALSE. Otherwise the -LIBH flag is useless and there'd be no wayto "unset" the 'true' value. Old version was (always) using LocusIteratorByHanger. Now default iterator is indeed LocusIteratorByState, and -LIBH will switch back to the old one
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1556 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:09:09 +00:00
depristo
eeb9b6eb13
GenotypeLikelhoods now support a cache per subclass, avoiding genotyping clashes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1554 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 10:39:14 +00:00
ebanks
0cc219c0df
-Added unit test for walkers dealing with intervals for cleaning
...
-I also uncovered a corner case in the cleaner that for some reason was commented out but shouldn't have been. Hooray for unit tests!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1553 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 02:35:17 +00:00
depristo
ec0f6f23c7
LocusIterationByState is now the system deafult. Fixed Aaron's build problem
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1552 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 01:28:05 +00:00
ebanks
5dbba6711c
Lots of changes: (I'll send email out in a sec)
...
1) Moved various disparate concordance / set splitting functionalities to a new parent tool which works like VariantFiltration (i.e. people can write various modules that fit inside and can be run though it).
2) Fixed up argument parsing in VariantFiltration to use key=value format so we don't accidentally mox up values (like I had been doing).
3) Have indel rod print samples
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1540 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-07 01:12:09 +00:00
depristo
1c3d67f0f3
Improvements to the CountCovariates and TableRecablirator, as well as regression tests for SLX and 454 data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1539 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 22:26:57 +00:00
depristo
2b0d1c52b2
General WalkerTest framework. Includes some minor changes to GATK core to enable creation of true command-line like GATK modules in the code. Extensive first-pass tests for SSG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1538 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 19:13:37 +00:00
aaron
0cc634ed5d
-Renamed rodVariants to RodGeliText
...
-Remove KGenomesSNPROD
-Remove rodFLT
-Renamed rodGFF to RodGenotypeChipAsGFF
-Fixed a problem in SSGenotypeCall
-Added basic SSGenotype Test class
-Make VCFHeader constructors public
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1536 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 18:40:43 +00:00
ebanks
fd1c72c151
Fixed package name
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1535 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 15:40:06 +00:00
ebanks
6c476514f8
Moved to core. Wiki pages are going up; unit tests will be written soon.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1533 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 15:09:11 +00:00
ebanks
849dce799d
This rod was all wrong for generating the alternate snp alleles (it returned null or even the wrong value); fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1531 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 14:21:46 +00:00
depristo
a08c68362e
Renaming error to getNegLog10PError(); added Cached clearing method to GL; SSG now has a CallResult that counts calls; No more Adding class to System.out, now to logger.info; First major testing piece (and general approach too) to unit testing of a walker -- SingleSampleGenotyper now knows how many calls to make on a particular 1mb region on NA12878 for each call type and counts the number of calls *AND* the compares the geli MD5 sum to the expected one!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1530 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 12:39:06 +00:00
aaron
3c2ae55859
changes for the genotype overhaul. Lots of changes focusing on the output side, from single sample genotyper to the output file formats like GLF and geli. Of note the genotype formats are still emitting posteriors as likelihoods; this is the way we've been doing it but it may change soon.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1529 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 05:31:15 +00:00
ebanks
5bd99fc1c4
VariantFiltration moved to core.
...
Another win for the team.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1517 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 15:41:41 +00:00
depristo
bdd0a6f9fa
change to make build work
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1511 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 13:43:10 +00:00
depristo
b01ac9de0c
High performance LocusIterator implementation. Now with greatly reduced memory impact and 2x (and more potentially) speed ups of raw locus iteration. General performance improvements to SSG with empirical probs. You can enable high-performance locus iteration with the -LIBS arg. It's still testing but passes validing pileup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1510 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 03:06:25 +00:00
ebanks
3dfc77dc89
Add an indel rod which represents the initial point of the indel only
...
(useful for alternate reference making)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1507 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 19:32:29 +00:00
aaron
0e6feff8f2
fixed locus pile-up limiting problem
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1505 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 16:56:44 +00:00
aaron
05c164ec69
changing the default behavior to allow any sized read pile-up (which may exceed the memory limit); the user can then select their own read limit. The default of 100K was arbitrary.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1498 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-01 14:46:00 +00:00
ebanks
54c0b6c430
Allow this ROD to consist of just the positions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1497 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-01 12:43:18 +00:00
aaron
4a1d79cd7b
added a flag, maximum_reads_at_locus, shortName "mrl", which limits the number of reads we add to the locusByHanger. In some bam files misalignment produces pile-ups of 750K or more reads. We now limit this to the default of 100K reads.
...
The user is warned if a locus exceeds this threshold, and no more reads are added.
Also CombineDup walker had an incorrect package name.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1496 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-01 04:21:58 +00:00
ebanks
0addae967a
IndelArtifact filter can now handle filtering false SNPs that occur within the span of an indel but after the first position
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1495 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-01 03:34:39 +00:00
ebanks
8e3c3324fa
Added filter for SNPs cleaned out by the realigner.
...
It uses the realigner output for filtering; in addition, dbsnp indels partially work; IndelGenotyper calls don't yet work.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1489 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 04:32:32 +00:00
ebanks
8bc7afe781
Smarter SW penalties
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1488 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 04:29:19 +00:00
ebanks
1a299dd459
Require each filter or feature to declare whether or not they want mapping quality zero reads in the alignment context
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1486 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 03:31:37 +00:00
ebanks
215e908a11
Reworking of the VariantFiltration system to allow for a windowed view of variants and inclusion of more data to the various filters.
...
This now allows us to incorporate both the clustered SNP filter and a SNP-near-indels filter, which otherwise wasn't possible.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1484 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 02:16:39 +00:00
depristo
813a4e838f
Removing old code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1482 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-30 19:27:11 +00:00
depristo
49a7babb2c
Better organization of Genotype likelihood calculations. NewHotness is now just GenotypeLikelihoods. There are 1, 3, and empirical base error models available as subclasses, along with a simple way to make this (see the factory).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1481 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-30 19:16:30 +00:00
depristo
522e4a77ae
Caching support across multiple technologies
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1480 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-30 18:10:14 +00:00
depristo
5af4bb628b
Intermediate checking before code reorganization. Full blown support for empirical transition probs in SSG for all platforms. Support for defaultPlatform arg in SSG. Renaming classes for final cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1479 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-30 17:34:43 +00:00
depristo
bde67428fd
Better formatting of the code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1477 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-29 21:46:47 +00:00
aaron
8331c195fb
changed the full name of maximum_reads to maximum_iterations for consistancy
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1475 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 16:03:46 +00:00
depristo
8e129d76fd
Support for original quality scores OQ flag. pQ flag in TableRecalibation to preserve quality scores below a threshold (defaulting to 5)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1474 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 14:14:21 +00:00
depristo
bf60980653
Experitmental support for empirical P(B_true | B_miscall). --useEmpiricalTransitions flag to SSG enables this support. Much better implementation of Genotype likelihoods -- the system should scream along now. Continuing progress towards deleting old model
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1469 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 00:17:24 +00:00
depristo
7cf9a54b64
change for new char/byte in BaseUtils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1467 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-27 23:47:56 +00:00
hanna
e5115409fa
Force columnSpacing to be at least one. We need a general-purpose, working tool for outputting columnar data to a PrintStream; will add JIRA.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1457 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-25 19:54:54 +00:00
hanna
ccdb4a0313
General-purpose management of output streams.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1454 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-23 00:56:02 +00:00
aaron
cd711d7697
Added detection of interval files with zero length to the GATK, and removed it from the interval merger walker: this was a critical blocking emergency issue for Eric.
...
also fixed some verbage in the GAEngine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1449 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-21 05:35:49 +00:00
aaron
6313c465fb
we want the RMS of the reads qualities not the RMS of the RMS of the read qualities.
...
Also the VCF version tag seems to be standardized as VCR. Updated the VCF code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1447 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-20 21:56:29 +00:00
ebanks
ed8c92a12a
make isReference do the right thing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1439 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-19 20:32:29 +00:00
ebanks
b3fe566c0c
Fix descriptions of walker args
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1436 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-18 19:46:48 +00:00
ebanks
53153fcd79
Allow RODs to specify that incomplete records are okay (i.e. that they allow optional fields)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1433 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-18 15:26:10 +00:00
ebanks
b2a18a9d61
- first pass at a basic indel filter (for now, based on size and homopolymer runs)
...
- fix simple indel rod printout
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1431 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-18 03:04:12 +00:00
jmaguire
1e8b97b560
quietly skip empty intervals files rather than crash.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1428 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 20:19:14 +00:00
jmaguire
92c63fb530
It's just "lod" not discovery_lod now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1427 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 18:44:09 +00:00
ebanks
df5744bcd3
update this walker so any variants can be passed in
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1426 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 16:30:39 +00:00
aaron
d101c20b30
added the ability to pass in a csv file of ROD triplets (one triplet per line) to the -B option
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1412 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 22:10:20 +00:00
ebanks
2c3f56cb8d
fix length calculation (it was including +/- char when it shouldn't)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1410 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 20:28:24 +00:00
aaron
fc1c76f1d2
fixing a bug where reads in overlapping interval based locus traversals could get assigned to only one of two the regions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1407 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 17:50:16 +00:00
ebanks
ecae619a1b
warn user when dbSNP rod looks suspicious
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1400 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 20:20:20 +00:00
asivache
2841e151d0
javadoc comments only
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1399 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 18:44:35 +00:00
ebanks
02f1af0743
Don't die when a readgroup is absent from the covariates table - it could
...
happen when all reads are unmapped (or have MQ0); instead, just don't alter
the quals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1394 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 03:10:33 +00:00
depristo
6d3ef73868
Now includes statistics on the allele agreement with dbSNP -- counts concordant calls as dbSNP = A/C and we say A/C, vs. we say A/T
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1392 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-07 19:37:07 +00:00
depristo
20baa80751
Updated polarized reference priors, need DiploidGenotypePriors class that is directly used by the NewHotness genotypelikelihoods, more bug fixes and refactoring, etc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1391 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-07 19:01:04 +00:00
depristo
bbd7bec5db
Continuing cleanup of SSG. GenotypeLikelihoods now have extensive testing routines. DiploidGenotype supports het, homref, etc calculations. SSG has been cleaned up to remove old garbage functionality. Also now supports output to standard output by simply omitting varout
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1387 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 22:25:30 +00:00
hanna
48713e154c
Windowed access to the reference.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1383 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 16:29:15 +00:00
depristo
65e9dcf5b7
Fully operational version of the new genotype likelihoods class. (1) Much cleaner interface. Now explicitly stores likelihoods, priors, and posteriors in separate arrays indexed by an enum, (2) no longer can be used to make calls, it relies on SSGGenotypeCall to order the likelihoods, calculate best to ref, etc, this is just for calculating genotype likelihoods now; (3) Now performs extensive error checking with validate() to ensure the system is behaving properly. (4) fixed incorrect treatment of N bases, which we being counted against everyone (5) likely found a stats bug in which heterozyosity was being applied incorrectly to the genotype priors
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1382 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 01:00:55 +00:00
depristo
4dc23f2763
Trivial formatting changes as I moved more legacy code into this system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1381 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 00:54:26 +00:00
depristo
34af669dbb
Explicit ENUM representation of the diploid genotypes. Please use this from now on to represent strings like AA or AT
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1380 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 00:53:43 +00:00
hanna
21d1eba502
Cleaned division of responsibilities between arguments to map function. Reference has been changed
...
from an array of bases to an object (ReferenceContext), and LocusContext has been renamed to reflect
the fact that it contains contextual information only about the alignments, not the locus in general.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1376 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-04 21:01:37 +00:00
depristo
20ff603339
New hotness and old and Busted genotype likelihood objects are now in the code base as I work towards a bug-free SSG along with a cleaner interface to the genotype likelihood object
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1372 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 23:07:53 +00:00
depristo
4986b2abd6
Fixing bug in SSG -- genotyping and discovery were mixed up by name
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1371 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 22:13:35 +00:00
depristo
3485397483
Reorganization of the genotyping system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1370 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 20:55:31 +00:00
depristo
d840a47b11
Slight reorganization of genotype interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1366 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 19:17:15 +00:00
ebanks
4366ce16e0
Made sure all RODs have a (good) toString() method - and use it in the Venn walker. (thanks, Mark)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1339 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 14:53:27 +00:00
ebanks
feb7238f10
Wasn't always returning the correct alt base
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1337 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 03:08:04 +00:00
hanna
5429b4d4a8
A bit of reorganization to help with more flexible output streams. Pushed construction of data
...
sources and post-construction validation back into the GATKEngine, leaving the MicroScheduler
to just microschedule.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1336 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 23:00:15 +00:00
hanna
7a13647c35
Support for specifying SAMFileReaders and SAMFileWriters as @Arguments directly. *Very*
...
rough initial implementation, but should provide enough support so that people can stop
creating SAMFileWriters in reduceInit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1332 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 16:11:45 +00:00
ebanks
3c4410f104
-add basic indel metrics to variant eval
...
-variants need a length method (can't assume it's a SNP)!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1324 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 03:25:03 +00:00
aaron
f1109e9070
Added the interator to SAMDataSource to prevent seeing dupplicate reads, only in a byReads traversal. The iterator discards any reads in the current interval that would have been seen in the previous interval.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1317 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-25 22:36:29 +00:00
asivache
a361e7b342
SAMDataSource is now exposed by GATK engine; SamFileHeaderMerger is exposed from Resources all the way up to SAMDataSource, so now we can see underlying individual readers should we need them; GATK engine has new methods getSamplesByReaders(), getLibrariesByReaders(), and getMergedReadGroupsByReaders(): each of these methods returns a list of sets, with each element (set) holding, respectively, samples, libraries, or (merged) read groups coming from an individual input bam file (so now when using multiple -I options we can still find out which of the input bams each read comes from)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1315 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 22:59:49 +00:00
hanna
2024fb3e32
Better division of responsibilities between sources and type descriptors.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1314 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 22:15:57 +00:00
ebanks
59f0c00d77
-set indel cleaning walkers to be in core package
...
-move Andrey's alignment utility classes to core
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1307 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 05:23:29 +00:00
aaron
0b16253db3
an iterator to fix the problem where read-based interval traversals are getting duplicate reads because reads span the two intervals.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1305 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 23:59:48 +00:00
ebanks
477502338f
moved major indel cleaning pieces to core (yippee!)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1301 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 19:59:51 +00:00
ebanks
4efe26c59a
Major: allow genotyper to optionally output in 1KG format, including outputting the samples in which indels are found.
...
Minor: refactor 454 filtering
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1300 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 19:53:51 +00:00
ebanks
ee8ed534e0
print full genotype for alt allele
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1297 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 01:35:23 +00:00
depristo
9c12c02768
AlleleBalance and on/off primary base filters -- version 0.0.1 -- for experimental use only
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1294 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-22 17:54:44 +00:00
hanna
6e4fd8db4a
Better formatting of available walkers, and only output them along with help. Cleanup JVMUtils.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1290 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 22:23:28 +00:00
depristo
761d70faa1
Better printing of multiple rods -- now produces a comma-separated set of values
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1289 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 21:58:27 +00:00
depristo
8588f75eb6
Better printing with toSimpleString() -- now prints out chip-genotype string
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1288 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 21:57:59 +00:00
hanna
1843684cd2
Cleanup: GATKEngine no longer needs to be lazy loaded, b/c the plugin directory no longer exists.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1287 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 18:50:51 +00:00
hanna
b43925c01e
Switched to Reflections ( http://code.google.com/p/reflections/ ) project for
...
inspecting the source tree and loading walkers, rather than trying to roll
our own by hand.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1286 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 18:32:22 +00:00
kiran
436a196e2b
Bug fixes to support hapmap genotyping concordance.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1285 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 16:20:10 +00:00
aaron
f13a1e8591
adding a couple of small changes to support contract with VariantEval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1283 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 03:49:15 +00:00
aaron
b4adb5133a
GLF rod as a AllelicVariant object.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1282 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 00:55:52 +00:00
ebanks
54fce98056
duh, don't print newline
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1280 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-20 03:04:27 +00:00
ebanks
1d2b545608
add FLT toString method (to be used in PrintRODs) and add it to ROD list
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1279 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-20 02:47:50 +00:00
ebanks
387316ebe1
added indel rod
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1276 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-17 16:05:51 +00:00
ebanks
da4af3b620
print indels in the format required for 1KG submissions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1275 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-17 15:59:18 +00:00