ebanks
1905b5defa
Hash by chromosome for now to reduce memory. This is a temporary solution until we decide how to reture the Injector for good.
...
Also, with Picard's latest changes, we need to make sure we don't double-close the sam writer.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1779 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 20:06:25 +00:00
ebanks
f9a1598d75
Reformatting
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1778 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 20:03:34 +00:00
ebanks
203c626fc2
A wrapper around the GenotypeLikelihoods class for the UnifiedGenotyper. This wrapper incorporates both strand-based likelihoods and a combined likelihoods over both strands.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1777 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 19:57:37 +00:00
sjia
5bdcc2b4dc
Included HLA class 2 genes in CreatePedFileWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1776 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 18:46:51 +00:00
sjia
8f896b734f
Included HLA class 2 genes in CreatePedFileWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1775 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 18:28:01 +00:00
aaron
f9a0eefe4b
GELI_BINARY is now functional, and can be used as a variant type in SSG (-vf=GELI_BINARY). Also fixed the max mapping quality column in both GELI output formats, we haven't been correctly outputing up until now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1774 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 18:20:34 +00:00
chartl
225b9bccc1
Modifications to NQSClusteredZScoreWalker to output empirical mismatch rates on bins by both Z-score and reported Q-score, rather than averaging over all Q-score bins for each Z-score.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1773 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 13:45:12 +00:00
depristo
8dd0924b37
Minor performance improvements to VariantEval -- now all of the CPU time is spent dealing with the ROD system...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1772 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 23:40:30 +00:00
aaron
4554ca1b28
more cleanup, depecaited the old genotype, corrected SNPCallsFromGenotypes' imports and two other classes that depend on it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1771 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 19:09:27 +00:00
aaron
3aec76136f
Removing the AllelicVariant interface, which is replaced by the Variation interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1770 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 17:44:24 +00:00
depristo
1bd0c3c145
variant eval allows non Variation rod objects
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1768 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 13:04:26 +00:00
aaron
66fc8ea444
GSA-182: Adding support for BED interval files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1767 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 02:45:31 +00:00
hanna
aec83b401d
SSG multithreading doesn't play well with some I/O changes made since I last svn up'd. Reverting until I can find the reason.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1766 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-05 19:48:57 +00:00
hanna
8a503c86b6
Code supporting SSG proof-of-concept shared memory parallelism.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1765 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-05 18:56:16 +00:00
ebanks
fb619bd593
-Refactoring: make GenotypeCalculationModel constructors empty so that they don't have to be updated every time we add a new parameter; instead put that logic in the super class's initialize method (making everything protected so that only the factory can access them)
...
-Adding initial version of Multi-sample calculation model. This still needs much work: it needs to be cleaned up and finished. Right now, it (purposely) throws a RuntimeException after completing the EM loop.
Also:
-Fix logic in GenotypeLikelihoods.setPriors
-Add logger to the models for output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1764 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-05 18:10:36 +00:00
sjia
98076db6b4
Modified CreatePedFileWalker to output PED file given HLA allele names
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1763 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-05 03:06:42 +00:00
hanna
56bc4fa21a
Fixed bug where not all alignments were returned if read aligned to multiple locations. Enhanced test suite to validate all alignments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1762 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-04 18:20:20 +00:00
hanna
05aa928e3e
Fix off-by-number-of-deletions issue with negative strand reads. Improved performance by factor of 2.5x.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1761 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-03 21:55:18 +00:00
chartl
7605ee500c
Idiocy! All tests were being disabled because I forgot the instanceof
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1760 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 20:04:56 +00:00
chartl
88d0890cc3
Made PooledGenotypeConcordance a standard test in VariantEval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1759 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 20:03:31 +00:00
aaron
7fc4472e6d
A big fix for MergingSamRecordIterator, where we weren't correctly handling the comparisons of SAMRecords correctly (we weren't applying the new reference index first, so sometimes the MT contig would be ID 23, sometimes 24 in different records).
...
Also a fix to the GLF tests, and a correction to PrintReadsWalker to remove the close() on the output source, the source handles that itself (and you get a double close).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1758 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 19:35:35 +00:00
chartl
68cb2ee54b
Tweaks to parameters for NQS analysis walkers; change to PowerAndCoverage for Jason Flannick (can input the number of alleles to compute power for - i.e. doubletons, tripletons; rather than statically checking singletons.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1757 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 19:11:27 +00:00
ebanks
7249fade05
updated
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1756 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 18:10:34 +00:00
ebanks
53a4bd7f51
A better understanding of what's going on means no need for clearing the cache
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1755 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 18:07:46 +00:00
aaron
e885cc4b21
changes for corrected GLF likelihood output, along with better tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1754 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-01 20:45:05 +00:00
hanna
2309d19f6f
Bug fix from Michael Ross: mark second read in sequence as second of pair.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1753 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-01 14:34:36 +00:00
aaron
2e4949c4d6
Rev'ing Picard, which includes the update to get all the reads in the query region (GSA-173). With it come a bunch of fixes, including retiring the FourBaseRecaller code, and updated md5 for some walker tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1751 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 20:37:59 +00:00
ebanks
303972aa4b
Yup, I broke the build...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1750 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 20:20:43 +00:00
ebanks
841d25cc44
Added ability to set the priors after construction (and requiring a flushing of the likelihoods cache)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1749 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 19:55:49 +00:00
hanna
665951f9f0
Support negative strand alignments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1748 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 18:10:26 +00:00
hanna
d3b1732cca
Start of refactoring effort. Make construction of alignment object simpler.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1747 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 15:19:31 +00:00
hanna
70e1aef550
Better integrate the @ArgumentCollection into the command-line argument parser. Walkers can now specify their own @ArgumentCollections. Also cleaned up a bit of the CommandLineProgram template method pattern to minimize duplicate code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1746 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 22:23:19 +00:00
aaron
b1c321f161
Adjusted Genotype concordance to more accurately use the new Genotyping code, fixed the VCF rod, and temp. fix the build by reintroducing Shermans ReadCigarFormatter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1745 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 21:28:21 +00:00
sjia
9b78a789e2
HLA Caller 2.0 Walkers:
...
CalculateBaseLikelihoodsWalker.java walks through reads calculates likelihoods using SSG at each base position
CalculateAlleleLikelihoodsWalker.java walks through HLA dictionary and calculates likelihoods for allele pairs given output of CalculateBaseLikelihoodsWalker.java
CalculatePhaseLikelihoodsWalker.java walks through reads and calculates likelihoods score for allele pairs given phase information
File Readers:
BaseLikelihoodsFileReader.java reads text file of likelihoods outputted by SSG
FrequencyFileReader.java reads text file of HLA allele frequencies
PolymorphicSitesFileReader.java reads text file of polymorphic sites in the HLA dictionary
SAMFileReader.java reads a sam file (used to read HLA dictionary when in another walker)
SimilarityFileReader.java reads a text file of how similar each read is to the closest HLA allele (used to filter misaligned reads)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1744 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 20:45:55 +00:00
chartl
281a77c981
Bugfix. isMismatch() was actually computing isMatch().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1743 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 20:04:59 +00:00
chartl
e28b45688c
More NQS Related Walkers to play with
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1742 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 20:01:04 +00:00
ebanks
9ef80e3c3c
One minor addition: to incorporate Pooled calling (and to be as general as possible), we allow the genotype calculation model to use rods if it wants.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1741 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 17:05:59 +00:00
ebanks
19bfe43173
First pass at a unified caller, being checked in now so Mark can give feedback if he chooses and so Matt can debug issues with the ArgumentCollection class.
...
Some notes:
1. This design should be flexible enough to include pooled calling (for now) after discussions with Chris.
2. Using the unified caller with the SingleSampleCalculationModel emits the exact same output as SSG over all of chr20 for NA12878. Additionally, when we include the "max deletions allowed at a locus" argument (so we don't try to call SNPs at deletion sites), it removed 233 SNP calls in chr20 that were clearly indel artficts.
3. The MultiSampleEMCalculationModel is still a work in progress and will be checked in later this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1740 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 16:48:15 +00:00
ebanks
8bd345ba00
Generalized deletions in pileup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1739 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 15:58:43 +00:00
andrewk
6134f49e3c
Convert de novo SNP caller to run using parent1 and parent2 BAM files (by splitting contexts by reader using getMergedReadGroupsByReaders) instead of geli files providing a large speed-up and obviating the need for large whole-genome geli files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1738 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 06:42:21 +00:00
andrewk
5dab95aa5a
Fix getMergedReadGroupsByReaders so that it provides read groups in the same way Picard does so that it works correctly when input read files have no clashes in their read groups and retain their original read group names.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1737 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 06:35:50 +00:00
andrewk
5662a88ee1
Cosmetic change to list sampling functions: the typical usage of n and k were reversed. No change in functionality of the classes has been made and unit tests still pass.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1736 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-28 18:12:32 +00:00
aaron
39598f1f0a
switching the concordance walker over to the new Variation system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1735 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-28 15:46:36 +00:00
asivache
bce2f0d7cf
Now instantiates the list of alternative consenses to evaluate as LinkedHashSet to guarantee iterator traversal order. Old implementation used HashSet and exhibited unstable behavior when two alt consenses turned out to be equally good: depending on the run conditions (including size of the interval set being cleaned??), either one could be seen first as selected as the 'best' one
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1734 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-28 06:15:46 +00:00
asivache
663175e868
Bug fix: when jumping onto next contig (chromosome), the walker was erasing last mismatch interval from the previous chr it was still holding without printing it; now it gets printed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1733 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 22:24:34 +00:00
asivache
92c6efabb7
moving IndelGenotyper out of playground
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1732 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 19:44:49 +00:00
asivache
aec61c558b
moving IndelGenotyper out from playground
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1731 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 19:43:53 +00:00
chartl
fe6d810515
Some basic commits that I've been sitting on for a while now:
...
@ PooledGenotypeConcordance - changes to output, now also reports false-negatives and false-positives as interesting sites. It's been like this in my directory for ages, just never committed.
@NQSExtendedGroupsCovariantWalker - change for formatting.
@NQSTabularDistributionWalker - breaks out the full (window_size)-dimensional empirical error rate distribution by the window. So if you've got a window of size 3; the quality score sequences 22 25 23 and 22 25 24 have their own bins (each of the 40^3 sequences get one) for match and mismatch counts.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1730 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 19:35:50 +00:00
sjia
f7684d9e1b
ImputeAllelesWalker fills missing portions of HLA dictionary based on best allele matches
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1729 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 18:51:46 +00:00
sjia
235de38c2e
Updates to FindClosestAlleleWalker and CreateHaplotypesWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1728 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 16:41:58 +00:00
aaron
130a01a40a
delete the integration test temp files when the test is over
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1727 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 16:34:08 +00:00
aaron
2b7d39035a
switched over the FastaAlternateReferenceWalker to the Variation system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1726 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 16:09:43 +00:00
aaron
7ffc1d97ef
Cut DeNovoSNPWalker over to the new Variation system, some renaming of methods on the Variation interface, and some corrections on the interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1724 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 04:35:52 +00:00
depristo
392152f149
1000x performance improvements to MSG for crisis control
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1723 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 23:44:33 +00:00
hanna
44879c81b0
Add in weights. Massive performance improvements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1722 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 23:19:15 +00:00
hanna
3b79f9eddc
Support 'N's and other mismatch characters in the reference.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1721 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 21:41:30 +00:00
hanna
08e8d2183a
Indels supported. Variable gap penalties are not yet taken into account.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1720 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 21:03:02 +00:00
aaron
d2af26e81f
Pooled EM SNP Rod converted over to the Variation interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1719 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 16:33:11 +00:00
ebanks
97105ac001
We need to return a null RODRecordList when the default value is null (as opposed to a list with a single null value), because that's what everyone is expecting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1718 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 16:23:12 +00:00
ebanks
d4b40bc06f
Filter for reads with missing read groups so we can safely assume all reads have valid read groups
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1717 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 16:10:26 +00:00
ebanks
90de2e0cde
Added ability to specify whether you want to use a point estimate or fair coin test calculation; for now you can use either but fair coin test is still experimental as it needs to be parametrized correctly. This job will hopefully be done by the future Bioinformatic Analyst...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1716 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 15:29:50 +00:00
aaron
d262cbd41c
changes to add VCF to the rod system, fix VCF output in VariantsToVCF, and some other minor changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1715 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 15:16:11 +00:00
sjia
1ee8ba590c
Reads cigar files
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1713 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 03:14:10 +00:00
sjia
9422156e09
Finds closest allele for each read in bam file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1712 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 03:12:20 +00:00
sjia
5c5151c4e7
Creates ped file from reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1711 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 02:48:29 +00:00
hanna
b0ec7fc144
More comprehensive testing of BWT (mismatches only) module, and lots of bug fixes.
...
Limitations:
1) Can't handle RC alignments.
2) Can't handle indels.
3) Can't handle N's in reference bases.
4) Stops at first hit.
Ran BWT over a test suite of 800k Ecoli reads. After removing alignments with indels / reads with Ns, the remaining reads were aligned with quality 'equal to' that of the alignment stored in the BAM file. In this case 'equal' quality is <= mismatches to the reference as the existing alignment stored in the BAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1710 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 23:44:59 +00:00
sjia
b446b3f1b6
CreateHaplotypeWalker now gives correct output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1709 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 21:13:52 +00:00
aaron
eeb14ec717
a couple of light changes to GenomeLocSortedSet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1708 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 20:38:53 +00:00
sjia
3916e165fb
New walker to output haplotypes for each read (for SNP analysis or imputation, etc)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1707 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 20:26:43 +00:00
ebanks
423a3ee894
Added a sequenom rod to empower Carrie to convert 1KG validation SNPs to sequenom format
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1706 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 20:22:09 +00:00
chartl
63f3d45ca4
fixing the build
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1705 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 20:04:09 +00:00
chartl
540e1b971f
And we fix one boneheaded mistake, which was actually causing the problem; though the last change was still correct.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1704 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 19:26:45 +00:00
chartl
124ca68fa8
And an IMMEDIATE minor fix (want neighborhood quality > base quality to be represented correctly)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1703 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 19:21:09 +00:00
chartl
8cdb78ebee
More sophisticated version of the NQSCovariantWalker - modified to be more explicit about how much higher the
...
quality score of a particular base is than the quality score of its neighbors. The granularity of the binning
jumps from 32 groups to 860 groups.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1702 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 19:18:24 +00:00
hanna
856bbd0320
Let Picard specify the default compression level.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1701 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 19:01:48 +00:00
aaron
f783cb30e0
adding an interface so that the current @Requires with ROD annotations work in walkers like VariantEval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1700 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:24:05 +00:00
hanna
ebfbe56b43
Make sure compression level always gets pushed into SAMFileWriterFactory.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1699 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:20:26 +00:00
asivache
fa87dd386d
Now uses rodRefSeq in its new reincarnation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1698 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:19:36 +00:00
asivache
bf7cd66d53
New, simpler rodRefSeq. Fully relies on the ROD system standard mechanisms. Multiple transcripts over a given location will be now returned by the ROD system itself as RodRecordList<rodRefSeq>; and yes, rodRefSeq does represent a single transcript record now and implements Transcript interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1697 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:18:25 +00:00
asivache
8fa4c93f5a
Transcript is now simply an interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1696 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:13:31 +00:00
asivache
fe36289e44
Noone needs this, probably... Old experimental code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1695 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:11:50 +00:00
asivache
1bd4c0077c
Now that ROD system supports overlapping RODs, we do not need rodRefSeq to be too smart and read in all the overlapping records (transcripts) on its own; leave it to the generic ROD mechanism.
...
PARTIAL commit; new, simpler rodRefSeq will reappear in a seq.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1694 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:11:16 +00:00
sjia
aa66074a0e
Compares each read to the HLA dictionary and outputs closest allele, as well as other stats
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1693 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 16:17:23 +00:00
aaron
11c32b588f
fixing VariantEvalWalkerIntegrationTest md5 sums, a couple comment changes, and a little bit of cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1690 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 20:54:47 +00:00
ebanks
b0fa19a0b2
Fixed recal integration test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1689 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 20:22:32 +00:00
ebanks
0748d80baa
Added a convenience method in rodDbSNP to deal with Andrey's changes to the rod. Now you can just ask for the first real SNP rod from the list and not have to think about how it works.
...
CountCovariates uses it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1688 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 20:15:40 +00:00
ebanks
6780476fb5
updated to deal with new dbSNP rod
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1687 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 19:46:32 +00:00
hanna
14477bb48e
Unidirectional alignments with mismatches now working. Significant refactoring will be required.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1686 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 19:05:10 +00:00
sjia
22932042ea
Combined Scores, bug fixed for printing HLA-C
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1685 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 18:28:16 +00:00
ebanks
682b765536
bug: need to upper case chars so that == works throughout
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1684 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 18:20:43 +00:00
asivache
d7d0b270d1
now supports blacklisting lanes (with -BL option will ignore reads from any of the specified lanes)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1682 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 16:46:57 +00:00
asivache
57d31b8e9b
Filter that discards reads from specific lanes; and also its friend that helps blacklisting a set of lanes from GATK command line a one-liner.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1681 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 16:46:06 +00:00
aaron
83a9eebcc4
fixed a bug I checked in that Eric found, for intervals with no start or stop coordinate. Now I owe Eric a cookie, and Milk Street is so far away. Damn.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1679 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 04:34:18 +00:00
ebanks
5ce42cbab3
After thinking about this a bit more, it makes sense to pull this functionality out of my walker and into the GenomeLocParser where everyone else can benefit from it...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1677 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 01:32:35 +00:00
aaron
7bfb5fad27
fixing the dbSNP test. Also removing unnessasary comments from the GenomeLocParser, added some tests, and commented out the performance test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1676 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 23:32:24 +00:00
aaron
39a47491a9
changes to make GenomeLoc string parsing 25% faster
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1675 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 22:37:47 +00:00
ebanks
b1dc6d65e4
interval merging is now blazingly fast
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1674 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 21:15:04 +00:00
asivache
15135788ca
OK, let's bite the bullet. Now rodDbSNP objects are 'isSNP()' only when they are annotated as 'exact', not a 'range'.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1673 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 19:25:16 +00:00
asivache
8ad181f46f
Note to myself: do 'ant clean' now and then or old versions of the code that suddenly became invalid will stick around. The world is not perfect, and neither is automatic dependency resolution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1672 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 17:40:52 +00:00
asivache
fb09835ef8
Changed to accomodate new ROD system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1671 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 17:10:56 +00:00
asivache
d2d1354199
Now uses BrokenRODSimulator class to pass the test. CHANGE the code to use new ROD system directly and MODIFY MD5 in corresponding tests, since a few snps are seen differently now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1670 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 17:03:49 +00:00
asivache
f4d270cba4
These classes now use BrokenRODSimulator class to pass the test. CHANGE the code to use new ROD system directly and MODIFY MD5 in corresponding tests, since a few snps are seen differently now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1669 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 17:03:15 +00:00
asivache
29adc0ca1c
Little class that can be used to simulate the results returned by the old ROD system. This is needed to keep couple of tests from breaking. All the code that uses this class must be changed urgently to accomodate the data as returned by new ROD system, and the corresponding tests (MD5 sums) have to be modified as well since some data as seen through the new ROD system is indeed different.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1668 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 16:58:56 +00:00
asivache
a6bd509593
Changing the carpet under your feet!! New incremental update to th eROD system has arrived.
...
all the updated classes now make use of new SeekableRodIterator instead of RODIterator. RODIterator class deleted. This batch makes only trivial updates to tests dictated by the change in the ROD system interface. Few less trivial updates to follow. This is a partial commit; a few walkers also still need to be updated, hold on...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1667 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 16:55:22 +00:00
asivache
4c67a49ccb
Removed unused imports
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1666 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 16:45:22 +00:00
hanna
e7f44ada98
Make unpackList public static so that Doug can use it in the scatter/gather framework.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1665 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 15:32:49 +00:00
ebanks
7b627fd622
Check for empty interval lists to merge
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1664 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 04:34:26 +00:00
hanna
7f5778c966
Update gsadevelopers -> gsahelp.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1663 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-20 23:36:54 +00:00
aaron
3a487dd64e
little fixes; also fixed a tyPo
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1662 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 22:38:51 +00:00
aaron
b6d7d6acc6
fix for the eval tests, and a change to the backedbygenotypes interface, more changes to come
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1661 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 22:25:16 +00:00
depristo
4318f75910
tiny cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1660 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 21:04:25 +00:00
depristo
3a341b2f06
Fixes for VariantEval for genotyping mode
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1659 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 21:01:43 +00:00
aaron
7b39aa4966
Adding the VCF ROD. Also changed the VCF objects to much more user friendly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1658 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 20:19:34 +00:00
sjia
83e6e5a3e4
Calculates Probability for each allele combination (using likelihood score and allele frequencies only)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1656 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 18:46:38 +00:00
ebanks
b19fd4d45c
Damn unit tests have a null Toolkit()...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1654 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 17:10:49 +00:00
ebanks
90626c843d
oops - we don't need reference bases, but we still need reference
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1653 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 16:24:45 +00:00
ebanks
2b2df4e1ba
- Fix the CleanedReadInjector to deal with -L intervals correctly.
...
- Some walkers don't use the ref base, so speed up traversals by not requiring it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1652 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 16:17:58 +00:00
ebanks
7da9ff2a9e
Put back the check that both chip and variant are not null.
...
Also, sanity check that ref is not 'N'.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1651 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 16:03:54 +00:00
asivache
94618044e8
Starting an update of ROD system. These basic classes will completely replace old ones, but with this update they are not linked to anything, so this checkpoint should be safe.
...
The main reason for the change is that there can be (and are!) multiple RODs overlapping with a single reference base position in a single track. There can be two "trivial" RODs at the same location (e.g. samtools pileup will have two point-like records at putative indel sites: one for the reference, the other one for the indel itself). Or there can be one or more "extended" RODs (length >1), eg. dbSNP can report an indel at Z:510-525 AND a SNP at Z:515.
The ReferenceOrderedDatum object (and children) will not be changed, but it is now explicitly interpreted as a single data *record*, possibly out of many available from a given track for the current site. As long as single data record occupies one line in a data file, the new ROD system will take care of loading and keeping multiple records, including extended (length > 1) ones, and will automatically drop the records when they finally go out of scope. For one-line-per-record, multiple-records-per-site RODs, there is no need anymore for the hack used so far that involved passing ROD's own implementation of iterator through reflection mechanism (though it will still work)
* RODRecordList:
the ROD system (its iterators) will now always return a LIST of all RODs available at current position or at current query interval (see below). This class is a trivial wrapper for a list of ROD objects, with added location argument for the whole collection. The location of the RODRecordList is where the ROD system is currently sitting at: a single, current base on the reference (if next() traversal is performed), or the location of the query interval when returned by seekForward() (see below). The ROD objects themselves will have their locations set according to the original data in the file. Hence, perusing the above example of a dbSNP indel at Z:510-525 and SNP at Z:515, when moving to the position Z:515 the ROD system will return a RODRecorList with location Z:515, and with two ROD objects packaged inside, one with location Z:510-525, the other with Z:515.
*RODRecodIterator:
Almost identical to old SimpleRODIterator used by ReferenceOrderedData; this is a low-level iterator that walks over records in the data file (with a callback to ROD's ::parseLine() to parse real data)
*SeekableRODIterator:
a decorator class that wraps around Iterator<ROD> (such as RODRecordIterator) and makes the data traversable by reference position, rather than record by record. This is reimplementation of the old RODIterator. SeekableRODIterator's ::next() moves to the next position on the ref and returns all RODs overlapping with that position (as a RODRecordList). This iterator also adds a seekForward(loc) operation, that allows fast forwarding to a specified position or interval. Length > 1 query arguments (extended intervals) are fully supported by seekForward(), the returned RODRecordList wil contain all RODs overlapping with the specified interval, and the location of the returned RODRecordList object will be set to that query interval. NOTE: it is ILLEGAL to perform next() after a seekForward() query with length > 1 interval. seekForward() with point-like (length=1) interval reenables next().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1650 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 15:58:37 +00:00
ebanks
66a4de9a1d
Genotype check should be case-insensitive
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1649 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 03:23:30 +00:00
hanna
c186a49d55
Time for a reorganization. Repackage generally useful alignment classes lower in the package structure, and create a subpackage for bwa-specific code. Repackage BWA alignment code away from BWT representation. Isolate byte- and word-packing streams in another package that will ultimately be killed off en masse.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1648 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-17 23:28:47 +00:00
hanna
b4df089b59
Putting some of the required data structures together for imperfect lookup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1647 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-17 22:43:11 +00:00
hanna
355136928e
Play nice with other jobs in this VM -- don't close stdout / stderr.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1646 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-17 18:55:08 +00:00
sjia
0e73b2ba8e
Use population allele frequencies to distinguish between top candidates
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1645 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-17 15:49:19 +00:00
chartl
534486a254
Output formatting changed:
...
- summary output now reported as a percentage rather than proportion; 2 sigfigs
- fixed minor bug where FNR was calculated over total calls rather than total variant sites
- column headers are_now_contiguous_strings
- spacing fixed
- "No Call" separated from "Ref Call" as its own column
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1644 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-17 14:00:25 +00:00
depristo
73bec6f36d
Now uses expanding array list for coverage histograms. No hard limit on maximum depth now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1643 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 23:27:25 +00:00
chartl
4ad46590a3
Changes to PooledGenotypeConcordance:
...
Additional output & better output formatting. It has now undergone a good five hours of testing; and for pools of size 1 outputs exactly the same statistics as GenotypeConcordance (when GenotypeConcordance is modified to do nothing on reference='N'); and for pools of many sizes outputs close to the expected (by genetics) statistics. Looks like this is working properly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1642 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 21:45:01 +00:00
chartl
386a6442ba
Actually deleted now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1641 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 20:28:06 +00:00
chartl
8fce376792
Changes:
...
Deletion: PooledGenotypeConcordanceNew
Rewrite: PooledGenotypeConcordance. It works, and is blazing fast compared to the earlier version (1 order of magnitude speedup)! And is now entirely non-hackey, as opposed to before when there were some hacky bits.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1640 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 20:22:16 +00:00
asivache
3e289fcaa4
A little piece that PairMaker needs in order to compile ;)
...
Iterates synchronously over two (name-ordered) single-end alignment SAM files with, possibly, multiple alignments per read and for each read name encountered returns pairs<all alignments for end1, all alignments for end2>
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1639 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 19:17:40 +00:00
asivache
2f29cf59ba
Very early, half-baked version. All it can do right now is to take two SAM files with end1 and end2 individual single-end alignmnets from a pair-end run and spit out a "paired" BAM file that contains ONLY properly paired ends (both ends align uniquely && both ends align to the same chromosome && the ends align in proper orientation). Insert size is currently not used (and not set in the output). Unpaired/unmapped reads are NOT transferred into the output bam. For the pairs that do get written, the output is (should be) standard-conforming: all flags are properly set and mate pair information is correct.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1637 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 18:38:18 +00:00
ebanks
5d85bd9671
By default, VF should ask for deleted bases so that they show up in coverage.
...
The Strand filter then needs to ignore those bases when determining bias.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1636 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 16:46:09 +00:00
ebanks
a7c306f757
-deal with offsets that can be -1
...
-added option to have "D"s inserted for deleted bases in pileup strings
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1635 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 16:44:57 +00:00
hanna
01a9b1c63b
Fix for problem where err stream remapped to output stream in certain cases, (hopefully) completing Matt's hat trick of fail. Thanks, unit tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1634 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 08:33:56 +00:00
aaron
eedf55e94d
temp fix for a broken test, we'll fix the test tomorrow. We promise, we're engineers, we love our tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1633 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 04:36:42 +00:00
chartl
f6bdb47bb6
Addition:
...
@PooledGenotypeConcordanceNew - a new version of the pooled genotype concordance test for Variant Eval. Code altered to be more extensible, use a private class for handling the count tables so it doesn't gunk up the code in the test itself, and for easy debugging. The hackier methods from the original were rewritten properly. Currently computes more statistics that it outputs. Code compiles, is never called by anything, and breaks none of the tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1632 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 04:14:58 +00:00
aaron
542d817688
more cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1631 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 21:42:03 +00:00
hanna
9f7cf73411
Output stream management fixes. I completely screwed up the output stream management system, but cleverly masked this fact by breaking some other stream management functionality that masked the problem.
...
Sigh.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1630 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 21:06:45 +00:00
hanna
17758b381c
Properly initialize redirected output streams in case of out and err.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1629 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 19:47:43 +00:00
andrewk
00dfe014b7
Added option to FastaReferenceWalker to change output FASTA file format's line width and to remove header lines; allows dumping raw sequence using intervals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1628 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 18:00:30 +00:00
hanna
b69eb208a6
Always create output files, even if no output was written to them.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1627 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 17:58:14 +00:00
aaron
b401929e41
incremental clean-up and changes for VariantEval, moved DiploidGenotype to a better home, and fixed a spelling error.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1624 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 04:48:42 +00:00
ebanks
6783fda42a
Updated unit test to reflect changes to vcf output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1623 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 01:56:08 +00:00
andrewk
fb254759cb
Trivial: Don't print reduce result
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1621 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 23:42:20 +00:00
hanna
118071cfd8
Proof-of-concept perfect read aligner, implemented as described in sec 2.4 of BWA paper. Has successfully aligned a handful of reads. Requires significant cleanup and refactoring.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1617 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 21:54:56 +00:00
ebanks
01e7b39c8d
1. Don't print out values in filter field of the VCF.
...
2. Fix ratio printouts (for params file)
3. Rename ratio filter's get counts method to avoid confusion; more changes on the way this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1616 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 21:03:39 +00:00
ebanks
436f543b3b
I owe Doug a beer for finding this:
...
don't print out intervals to be merged if they're not within the global -L intervals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1615 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 20:22:30 +00:00
chartl
7d6d114ab5
Additions:
...
@NQSMismatchCovariantWalker - Walks along the gene calculating the table
# NQS
# Q score
# mismatches at non-dbsnp sites
# total number of bases at non-dbsnp sites
And prints it out at the end.
Changes:
@PooledGenotypeConcordance now works. Takes a path to a file listing a bunch of hapmap IDs in whatever pool we want to check, reads those in, and checks for concordance by name.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1614 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 20:12:04 +00:00
sjia
9be1832d7b
Phasing version 1
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1613 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 16:10:37 +00:00
asivache
a009592662
the life in the magical kingdom of fully spec-conforming SAM files would be so... magical. For now, however, there are plenty of ways to end up with inconsistent SAM records. For instance, a SAM file with missing header will result in SAM records with ref. name set, but getReferenceIndex() returning null. This, in turn, was tripping isReadUnmapped(). The method is now fixed, so that it suffices to have *either* reference name *or* reference index set for the read to be considered mapped (the flag is still checked)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1612 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 16:04:19 +00:00