ebanks
c9c8fd1fef
Added the discovery LOD score to the meta data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1825 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 01:24:06 +00:00
ebanks
0c06bf9dbc
Explicitly set output to GELI now that default is VCF
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1824 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 22:12:03 +00:00
hanna
a76fac4687
Cleanup existing speedups. Minor performance improvements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1823 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 21:51:18 +00:00
hanna
837ae1d33a
Optimization: from 22k reads/min - 30k reads/min.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1822 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 20:59:29 +00:00
ebanks
96b8499a31
Remodeled version of the UnifiedGenotyper.
...
We currently get identical lods and slods as MultiSampleCaller (except slods for ref calls, as I discussed with Jared) and are a bit faster in my few test cases. Single-sample mode still emulates SSG.
The remaining to do items:
1. more testing still needed
2. we currently only output lods/slods, but I need to emit actual calls
3. stubs are in place for Mark's proposed version of the EM calculation and now I need to add the actual code.
More check-ins coming soon...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1821 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 20:27:01 +00:00
ebanks
b28446acac
Multi-sample calls now have associated meta-data (SLOD, allele freq), which wil
...
l soon actually be used...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1820 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 20:08:43 +00:00
hanna
db642fd08b
Optimization: from 10k reads/sec - 22k reads/sec..
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1819 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 18:07:15 +00:00
aaron
77499e35ac
fixes for GSA-199: Need easier way to write binary outputs to standard output. GLF and VCF now have stream constructors, and can get dumped to standard out.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1818 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 15:50:20 +00:00
hanna
f37564e63a
Our BWA is now looking at roughly the same number of candidate alignments as BWA/C. Performance is now at 11k reads / min, still a long way from BWA/C.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1817 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 15:50:04 +00:00
chartl
8d0e057d83
I got bored today and decided to write the confusion matrix calculator. At present it is untested. I'm submitting it to subversion to make sure
...
I have previous revision to revert back to.
This is a calculator that will calculate:
P[ True base is X | read base mismatches, secondary base is Y, previous K bases are Z1,Z2,...ZK ]
where the number of pervious reference bases to take into account is user-defined. The secondary base is optional as well.
--usePreviousBases k
tells the walker to use the k previous reference bases in the transition table
--useSecondaryBase
tells the walker to use the secondary base at a locus in the transition table
these can be used together.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1816 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 02:55:29 +00:00
ebanks
be92a1e603
Don't try to close if the lazy initialize hasn't triggered
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1815 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 01:20:25 +00:00
chartl
ec83bc6ec5
This somehow didn't make it into subversion the last time.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1814 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 21:11:13 +00:00
chartl
ecbb11e017
Modified PowerBelowFrequency to ignore reads below a user-defined mapping quality. Request from Jason Flannick.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1813 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 20:59:24 +00:00
chartl
ec68ae3bc5
Added a filter that will split the read set by a threshold of mapping quality (Request from Jason Flannick)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1812 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 20:58:37 +00:00
chartl
0d73fe69e7
Recalibrator by NQS. Had this puppy running all afternoon. Thing had got through 100,000,000 reads before I decided to delete my sting tree. *sigh*, a little more delay.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1811 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 20:55:02 +00:00
chartl
ee0afba0af
Recalibration stuff...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1810 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 20:51:39 +00:00
ebanks
caf689821f
added method to get normalized posteriors
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1809 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 02:33:22 +00:00
ebanks
cf7a26759d
-use the getReadGroup() function that was added to picard for us
...
-clean up some include lines
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1808 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-12 01:39:32 +00:00
hanna
d844d1c496
SAMFileWriters specified as command-line arguments were sometimes incorrectly altering the default short name. Make sure short name is not specified if shortName is not specified but fullName is.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1807 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 19:16:46 +00:00
hanna
da084357db
Fixed minor typo in output message.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1806 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 18:56:54 +00:00
aaron
62c484b57a
Fixes for GSA-201, where enumerated types in command line arguments had to be defined as all uppercase for the system to work.
...
Also a little playground walker that changes the sort order flag of a BAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1805 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 18:11:32 +00:00
hanna
32d55eb2ff
Fix issue Eric was seeing with java.lang.Error in unmap0.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1804 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 17:46:56 +00:00
ebanks
9f3482ef11
VCF is both a multi- and single- sample format, so we shouldn't be throwing an exception when used for SS
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1803 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 17:43:26 +00:00
jmaguire
d9f5a314ac
avoid an out of memory error by no putting more than 5000 reads in the cache. on pilot1 at least those are crazy loci anyway.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1802 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 14:56:55 +00:00
hanna
f4b6afb42c
JVM issue id 5092131 ( http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5092131 )
...
was causing OOM issues with the new mmapping fasta file reader during large jobs.
Temporarily reverting the reader until a workaround can be found.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1801 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 04:45:46 +00:00
chartl
6d7f4481e4
Changed traversal type slightly
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1800 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 04:11:48 +00:00
ebanks
a9f3d46fa8
Your time has come, SSG.
...
Fare thee well.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1799 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 20:27:56 +00:00
jmaguire
8fdb8922b8
now output in the exact format that works with sequenom software.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1798 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 20:06:27 +00:00
aaron
98e3a0bf1a
VCF can now be emitted from SSG. The basic's are there (the genotype, read depth, our error estimate), but more fields need to be added for each record as nessasary.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1797 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 19:50:04 +00:00
hanna
95f24d671d
Fixed 'visualization' of reads that didn't match bwa's alignments exactly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1796 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 19:45:30 +00:00
kiran
29ad6cd876
Made redundant by BCMMarkDupes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1795 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 18:47:20 +00:00
kiran
94d82d1915
Matthew Bainbridge's duplicate removal utility for 454 data. This code should eventually be moved into a read walker. For now, it's being introduced into the repository as-is (well, with one minor change to make the handling of command-line arguments a little more straightforward).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1794 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 18:32:37 +00:00
ebanks
15bf014e0b
logger.info -> logger.debug (don't want to risk filling up my log on genome-wide calls)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1792 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 17:53:11 +00:00
ebanks
df8ea8f437
UG integration test. This was the old SSG test with MD5s updated.
...
I'll need to add some multi-sample tests in a bit...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1791 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 17:43:58 +00:00
ebanks
008455915a
One way of making the integration test stop failing is to remove it...
...
[waiting for Matt to cringe...]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1789 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 17:08:41 +00:00
chartl
f89a89ffe3
Use of AlleleFrequency as an input to PowerAndCoverage is deprecated by the new walker. Reverting to the standard "power at 1 allele" calculation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1788 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 16:07:45 +00:00
chartl
ae05f5c7ad
Fixin the header.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1787 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 15:49:28 +00:00
chartl
11ff1e09b8
A new power walker for the user to feed in a number of alleles. Call that number k. Output is:
...
Locus Power_for_k_alleles Power_for_k-2_alleles Power_for_k-2_alleles ... Power_for_1_allele
This was a request from Jason Flannick & the T2DB group.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1786 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 15:35:35 +00:00
ebanks
04fe50cadd
*** We no longer have a separate model for the single-sample case. ***
...
For now, a single sample input will be special-cased in the EM model - but that will change when the EM model degenerates to the single sample output with a single sample as input. For now, the EM code for multi-samples isn't finished; I'm planning on checking that in soon.
The SingleSampleIntegrationTest now uses the UnifiedCaller instead of SSG, and so should all of you. More on that in a separate email.
Other minor cleanups added too.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1785 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 14:08:57 +00:00
jmaguire
32128e093a
misc. changes to get the numbers back to the baseline while keeping the speedup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1784 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 12:27:07 +00:00
jmaguire
d38a0d04b9
fix a snp mask offset error.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1783 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 12:25:40 +00:00
kiran
829e99413b
Rescores a variant after removing duplicates (defined very strictly as reads with the same start points).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1782 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 03:07:36 +00:00
hanna
fcb6a992c8
Switched IndexedFastaSequenceFile over to use memory mapping to load data rather than
...
the loop-with-small block size. Performance improvements in loading refs are extreme;
segments can be loaded in <1ms. chr1 in its entirety can be loaded in 1.5sec (down
from 30sec).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1781 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 00:07:15 +00:00
jmaguire
02d2492d68
Simple tool for picking sequenom probes for SNPs. Can be extended to indels if necessary.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1780 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 23:46:41 +00:00
ebanks
1905b5defa
Hash by chromosome for now to reduce memory. This is a temporary solution until we decide how to reture the Injector for good.
...
Also, with Picard's latest changes, we need to make sure we don't double-close the sam writer.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1779 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 20:06:25 +00:00
ebanks
f9a1598d75
Reformatting
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1778 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 20:03:34 +00:00
ebanks
203c626fc2
A wrapper around the GenotypeLikelihoods class for the UnifiedGenotyper. This wrapper incorporates both strand-based likelihoods and a combined likelihoods over both strands.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1777 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 19:57:37 +00:00
sjia
5bdcc2b4dc
Included HLA class 2 genes in CreatePedFileWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1776 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 18:46:51 +00:00
sjia
8f896b734f
Included HLA class 2 genes in CreatePedFileWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1775 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 18:28:01 +00:00
aaron
f9a0eefe4b
GELI_BINARY is now functional, and can be used as a variant type in SSG (-vf=GELI_BINARY). Also fixed the max mapping quality column in both GELI output formats, we haven't been correctly outputing up until now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1774 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 18:20:34 +00:00
chartl
225b9bccc1
Modifications to NQSClusteredZScoreWalker to output empirical mismatch rates on bins by both Z-score and reported Q-score, rather than averaging over all Q-score bins for each Z-score.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1773 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 13:45:12 +00:00
depristo
8dd0924b37
Minor performance improvements to VariantEval -- now all of the CPU time is spent dealing with the ROD system...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1772 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 23:40:30 +00:00
aaron
4554ca1b28
more cleanup, depecaited the old genotype, corrected SNPCallsFromGenotypes' imports and two other classes that depend on it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1771 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 19:09:27 +00:00
aaron
3aec76136f
Removing the AllelicVariant interface, which is replaced by the Variation interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1770 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 17:44:24 +00:00
depristo
1bd0c3c145
variant eval allows non Variation rod objects
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1768 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 13:04:26 +00:00
aaron
66fc8ea444
GSA-182: Adding support for BED interval files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1767 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 02:45:31 +00:00
hanna
aec83b401d
SSG multithreading doesn't play well with some I/O changes made since I last svn up'd. Reverting until I can find the reason.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1766 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-05 19:48:57 +00:00
hanna
8a503c86b6
Code supporting SSG proof-of-concept shared memory parallelism.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1765 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-05 18:56:16 +00:00
ebanks
fb619bd593
-Refactoring: make GenotypeCalculationModel constructors empty so that they don't have to be updated every time we add a new parameter; instead put that logic in the super class's initialize method (making everything protected so that only the factory can access them)
...
-Adding initial version of Multi-sample calculation model. This still needs much work: it needs to be cleaned up and finished. Right now, it (purposely) throws a RuntimeException after completing the EM loop.
Also:
-Fix logic in GenotypeLikelihoods.setPriors
-Add logger to the models for output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1764 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-05 18:10:36 +00:00
sjia
98076db6b4
Modified CreatePedFileWalker to output PED file given HLA allele names
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1763 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-05 03:06:42 +00:00
hanna
56bc4fa21a
Fixed bug where not all alignments were returned if read aligned to multiple locations. Enhanced test suite to validate all alignments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1762 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-04 18:20:20 +00:00
hanna
05aa928e3e
Fix off-by-number-of-deletions issue with negative strand reads. Improved performance by factor of 2.5x.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1761 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-03 21:55:18 +00:00
chartl
7605ee500c
Idiocy! All tests were being disabled because I forgot the instanceof
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1760 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 20:04:56 +00:00
chartl
88d0890cc3
Made PooledGenotypeConcordance a standard test in VariantEval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1759 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 20:03:31 +00:00
aaron
7fc4472e6d
A big fix for MergingSamRecordIterator, where we weren't correctly handling the comparisons of SAMRecords correctly (we weren't applying the new reference index first, so sometimes the MT contig would be ID 23, sometimes 24 in different records).
...
Also a fix to the GLF tests, and a correction to PrintReadsWalker to remove the close() on the output source, the source handles that itself (and you get a double close).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1758 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 19:35:35 +00:00
chartl
68cb2ee54b
Tweaks to parameters for NQS analysis walkers; change to PowerAndCoverage for Jason Flannick (can input the number of alleles to compute power for - i.e. doubletons, tripletons; rather than statically checking singletons.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1757 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 19:11:27 +00:00
ebanks
7249fade05
updated
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1756 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 18:10:34 +00:00
ebanks
53a4bd7f51
A better understanding of what's going on means no need for clearing the cache
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1755 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 18:07:46 +00:00
aaron
e885cc4b21
changes for corrected GLF likelihood output, along with better tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1754 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-01 20:45:05 +00:00
hanna
2309d19f6f
Bug fix from Michael Ross: mark second read in sequence as second of pair.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1753 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-01 14:34:36 +00:00
aaron
2e4949c4d6
Rev'ing Picard, which includes the update to get all the reads in the query region (GSA-173). With it come a bunch of fixes, including retiring the FourBaseRecaller code, and updated md5 for some walker tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1751 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 20:37:59 +00:00
ebanks
303972aa4b
Yup, I broke the build...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1750 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 20:20:43 +00:00
ebanks
841d25cc44
Added ability to set the priors after construction (and requiring a flushing of the likelihoods cache)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1749 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 19:55:49 +00:00
hanna
665951f9f0
Support negative strand alignments.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1748 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 18:10:26 +00:00
hanna
d3b1732cca
Start of refactoring effort. Make construction of alignment object simpler.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1747 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 15:19:31 +00:00
hanna
70e1aef550
Better integrate the @ArgumentCollection into the command-line argument parser. Walkers can now specify their own @ArgumentCollections. Also cleaned up a bit of the CommandLineProgram template method pattern to minimize duplicate code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1746 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 22:23:19 +00:00
aaron
b1c321f161
Adjusted Genotype concordance to more accurately use the new Genotyping code, fixed the VCF rod, and temp. fix the build by reintroducing Shermans ReadCigarFormatter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1745 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 21:28:21 +00:00
sjia
9b78a789e2
HLA Caller 2.0 Walkers:
...
CalculateBaseLikelihoodsWalker.java walks through reads calculates likelihoods using SSG at each base position
CalculateAlleleLikelihoodsWalker.java walks through HLA dictionary and calculates likelihoods for allele pairs given output of CalculateBaseLikelihoodsWalker.java
CalculatePhaseLikelihoodsWalker.java walks through reads and calculates likelihoods score for allele pairs given phase information
File Readers:
BaseLikelihoodsFileReader.java reads text file of likelihoods outputted by SSG
FrequencyFileReader.java reads text file of HLA allele frequencies
PolymorphicSitesFileReader.java reads text file of polymorphic sites in the HLA dictionary
SAMFileReader.java reads a sam file (used to read HLA dictionary when in another walker)
SimilarityFileReader.java reads a text file of how similar each read is to the closest HLA allele (used to filter misaligned reads)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1744 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 20:45:55 +00:00
chartl
281a77c981
Bugfix. isMismatch() was actually computing isMatch().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1743 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 20:04:59 +00:00
chartl
e28b45688c
More NQS Related Walkers to play with
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1742 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 20:01:04 +00:00
ebanks
9ef80e3c3c
One minor addition: to incorporate Pooled calling (and to be as general as possible), we allow the genotype calculation model to use rods if it wants.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1741 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 17:05:59 +00:00
ebanks
19bfe43173
First pass at a unified caller, being checked in now so Mark can give feedback if he chooses and so Matt can debug issues with the ArgumentCollection class.
...
Some notes:
1. This design should be flexible enough to include pooled calling (for now) after discussions with Chris.
2. Using the unified caller with the SingleSampleCalculationModel emits the exact same output as SSG over all of chr20 for NA12878. Additionally, when we include the "max deletions allowed at a locus" argument (so we don't try to call SNPs at deletion sites), it removed 233 SNP calls in chr20 that were clearly indel artficts.
3. The MultiSampleEMCalculationModel is still a work in progress and will be checked in later this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1740 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 16:48:15 +00:00
ebanks
8bd345ba00
Generalized deletions in pileup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1739 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 15:58:43 +00:00
andrewk
6134f49e3c
Convert de novo SNP caller to run using parent1 and parent2 BAM files (by splitting contexts by reader using getMergedReadGroupsByReaders) instead of geli files providing a large speed-up and obviating the need for large whole-genome geli files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1738 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 06:42:21 +00:00
andrewk
5dab95aa5a
Fix getMergedReadGroupsByReaders so that it provides read groups in the same way Picard does so that it works correctly when input read files have no clashes in their read groups and retain their original read group names.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1737 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 06:35:50 +00:00
andrewk
5662a88ee1
Cosmetic change to list sampling functions: the typical usage of n and k were reversed. No change in functionality of the classes has been made and unit tests still pass.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1736 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-28 18:12:32 +00:00
aaron
39598f1f0a
switching the concordance walker over to the new Variation system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1735 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-28 15:46:36 +00:00
asivache
bce2f0d7cf
Now instantiates the list of alternative consenses to evaluate as LinkedHashSet to guarantee iterator traversal order. Old implementation used HashSet and exhibited unstable behavior when two alt consenses turned out to be equally good: depending on the run conditions (including size of the interval set being cleaned??), either one could be seen first as selected as the 'best' one
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1734 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-28 06:15:46 +00:00
asivache
663175e868
Bug fix: when jumping onto next contig (chromosome), the walker was erasing last mismatch interval from the previous chr it was still holding without printing it; now it gets printed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1733 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 22:24:34 +00:00
asivache
92c6efabb7
moving IndelGenotyper out of playground
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1732 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 19:44:49 +00:00
asivache
aec61c558b
moving IndelGenotyper out from playground
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1731 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 19:43:53 +00:00
chartl
fe6d810515
Some basic commits that I've been sitting on for a while now:
...
@ PooledGenotypeConcordance - changes to output, now also reports false-negatives and false-positives as interesting sites. It's been like this in my directory for ages, just never committed.
@NQSExtendedGroupsCovariantWalker - change for formatting.
@NQSTabularDistributionWalker - breaks out the full (window_size)-dimensional empirical error rate distribution by the window. So if you've got a window of size 3; the quality score sequences 22 25 23 and 22 25 24 have their own bins (each of the 40^3 sequences get one) for match and mismatch counts.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1730 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 19:35:50 +00:00
sjia
f7684d9e1b
ImputeAllelesWalker fills missing portions of HLA dictionary based on best allele matches
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1729 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 18:51:46 +00:00
sjia
235de38c2e
Updates to FindClosestAlleleWalker and CreateHaplotypesWalker
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1728 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 16:41:58 +00:00
aaron
130a01a40a
delete the integration test temp files when the test is over
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1727 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 16:34:08 +00:00
aaron
2b7d39035a
switched over the FastaAlternateReferenceWalker to the Variation system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1726 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 16:09:43 +00:00
aaron
7ffc1d97ef
Cut DeNovoSNPWalker over to the new Variation system, some renaming of methods on the Variation interface, and some corrections on the interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1724 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 04:35:52 +00:00
depristo
392152f149
1000x performance improvements to MSG for crisis control
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1723 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 23:44:33 +00:00
hanna
44879c81b0
Add in weights. Massive performance improvements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1722 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 23:19:15 +00:00
hanna
3b79f9eddc
Support 'N's and other mismatch characters in the reference.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1721 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 21:41:30 +00:00
hanna
08e8d2183a
Indels supported. Variable gap penalties are not yet taken into account.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1720 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 21:03:02 +00:00
aaron
d2af26e81f
Pooled EM SNP Rod converted over to the Variation interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1719 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 16:33:11 +00:00
ebanks
97105ac001
We need to return a null RODRecordList when the default value is null (as opposed to a list with a single null value), because that's what everyone is expecting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1718 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 16:23:12 +00:00
ebanks
d4b40bc06f
Filter for reads with missing read groups so we can safely assume all reads have valid read groups
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1717 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 16:10:26 +00:00
ebanks
90de2e0cde
Added ability to specify whether you want to use a point estimate or fair coin test calculation; for now you can use either but fair coin test is still experimental as it needs to be parametrized correctly. This job will hopefully be done by the future Bioinformatic Analyst...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1716 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 15:29:50 +00:00
aaron
d262cbd41c
changes to add VCF to the rod system, fix VCF output in VariantsToVCF, and some other minor changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1715 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 15:16:11 +00:00
sjia
1ee8ba590c
Reads cigar files
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1713 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 03:14:10 +00:00
sjia
9422156e09
Finds closest allele for each read in bam file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1712 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 03:12:20 +00:00
sjia
5c5151c4e7
Creates ped file from reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1711 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 02:48:29 +00:00
hanna
b0ec7fc144
More comprehensive testing of BWT (mismatches only) module, and lots of bug fixes.
...
Limitations:
1) Can't handle RC alignments.
2) Can't handle indels.
3) Can't handle N's in reference bases.
4) Stops at first hit.
Ran BWT over a test suite of 800k Ecoli reads. After removing alignments with indels / reads with Ns, the remaining reads were aligned with quality 'equal to' that of the alignment stored in the BAM file. In this case 'equal' quality is <= mismatches to the reference as the existing alignment stored in the BAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1710 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 23:44:59 +00:00
sjia
b446b3f1b6
CreateHaplotypeWalker now gives correct output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1709 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 21:13:52 +00:00
aaron
eeb14ec717
a couple of light changes to GenomeLocSortedSet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1708 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 20:38:53 +00:00
sjia
3916e165fb
New walker to output haplotypes for each read (for SNP analysis or imputation, etc)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1707 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 20:26:43 +00:00
ebanks
423a3ee894
Added a sequenom rod to empower Carrie to convert 1KG validation SNPs to sequenom format
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1706 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 20:22:09 +00:00
chartl
63f3d45ca4
fixing the build
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1705 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 20:04:09 +00:00
chartl
540e1b971f
And we fix one boneheaded mistake, which was actually causing the problem; though the last change was still correct.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1704 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 19:26:45 +00:00
chartl
124ca68fa8
And an IMMEDIATE minor fix (want neighborhood quality > base quality to be represented correctly)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1703 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 19:21:09 +00:00
chartl
8cdb78ebee
More sophisticated version of the NQSCovariantWalker - modified to be more explicit about how much higher the
...
quality score of a particular base is than the quality score of its neighbors. The granularity of the binning
jumps from 32 groups to 860 groups.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1702 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 19:18:24 +00:00
hanna
856bbd0320
Let Picard specify the default compression level.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1701 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 19:01:48 +00:00
aaron
f783cb30e0
adding an interface so that the current @Requires with ROD annotations work in walkers like VariantEval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1700 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:24:05 +00:00
hanna
ebfbe56b43
Make sure compression level always gets pushed into SAMFileWriterFactory.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1699 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:20:26 +00:00
asivache
fa87dd386d
Now uses rodRefSeq in its new reincarnation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1698 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:19:36 +00:00
asivache
bf7cd66d53
New, simpler rodRefSeq. Fully relies on the ROD system standard mechanisms. Multiple transcripts over a given location will be now returned by the ROD system itself as RodRecordList<rodRefSeq>; and yes, rodRefSeq does represent a single transcript record now and implements Transcript interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1697 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:18:25 +00:00
asivache
8fa4c93f5a
Transcript is now simply an interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1696 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:13:31 +00:00
asivache
fe36289e44
Noone needs this, probably... Old experimental code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1695 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:11:50 +00:00
asivache
1bd4c0077c
Now that ROD system supports overlapping RODs, we do not need rodRefSeq to be too smart and read in all the overlapping records (transcripts) on its own; leave it to the generic ROD mechanism.
...
PARTIAL commit; new, simpler rodRefSeq will reappear in a seq.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1694 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 18:11:16 +00:00
sjia
aa66074a0e
Compares each read to the HLA dictionary and outputs closest allele, as well as other stats
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1693 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 16:17:23 +00:00
aaron
11c32b588f
fixing VariantEvalWalkerIntegrationTest md5 sums, a couple comment changes, and a little bit of cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1690 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 20:54:47 +00:00
ebanks
b0fa19a0b2
Fixed recal integration test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1689 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 20:22:32 +00:00
ebanks
0748d80baa
Added a convenience method in rodDbSNP to deal with Andrey's changes to the rod. Now you can just ask for the first real SNP rod from the list and not have to think about how it works.
...
CountCovariates uses it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1688 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 20:15:40 +00:00
ebanks
6780476fb5
updated to deal with new dbSNP rod
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1687 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 19:46:32 +00:00
hanna
14477bb48e
Unidirectional alignments with mismatches now working. Significant refactoring will be required.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1686 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 19:05:10 +00:00
sjia
22932042ea
Combined Scores, bug fixed for printing HLA-C
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1685 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 18:28:16 +00:00
ebanks
682b765536
bug: need to upper case chars so that == works throughout
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1684 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 18:20:43 +00:00
asivache
d7d0b270d1
now supports blacklisting lanes (with -BL option will ignore reads from any of the specified lanes)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1682 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 16:46:57 +00:00
asivache
57d31b8e9b
Filter that discards reads from specific lanes; and also its friend that helps blacklisting a set of lanes from GATK command line a one-liner.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1681 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 16:46:06 +00:00
aaron
83a9eebcc4
fixed a bug I checked in that Eric found, for intervals with no start or stop coordinate. Now I owe Eric a cookie, and Milk Street is so far away. Damn.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1679 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 04:34:18 +00:00
ebanks
5ce42cbab3
After thinking about this a bit more, it makes sense to pull this functionality out of my walker and into the GenomeLocParser where everyone else can benefit from it...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1677 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 01:32:35 +00:00
aaron
7bfb5fad27
fixing the dbSNP test. Also removing unnessasary comments from the GenomeLocParser, added some tests, and commented out the performance test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1676 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 23:32:24 +00:00
aaron
39a47491a9
changes to make GenomeLoc string parsing 25% faster
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1675 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 22:37:47 +00:00
ebanks
b1dc6d65e4
interval merging is now blazingly fast
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1674 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 21:15:04 +00:00
asivache
15135788ca
OK, let's bite the bullet. Now rodDbSNP objects are 'isSNP()' only when they are annotated as 'exact', not a 'range'.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1673 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 19:25:16 +00:00
asivache
8ad181f46f
Note to myself: do 'ant clean' now and then or old versions of the code that suddenly became invalid will stick around. The world is not perfect, and neither is automatic dependency resolution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1672 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 17:40:52 +00:00
asivache
fb09835ef8
Changed to accomodate new ROD system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1671 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 17:10:56 +00:00
asivache
d2d1354199
Now uses BrokenRODSimulator class to pass the test. CHANGE the code to use new ROD system directly and MODIFY MD5 in corresponding tests, since a few snps are seen differently now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1670 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 17:03:49 +00:00
asivache
f4d270cba4
These classes now use BrokenRODSimulator class to pass the test. CHANGE the code to use new ROD system directly and MODIFY MD5 in corresponding tests, since a few snps are seen differently now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1669 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 17:03:15 +00:00
asivache
29adc0ca1c
Little class that can be used to simulate the results returned by the old ROD system. This is needed to keep couple of tests from breaking. All the code that uses this class must be changed urgently to accomodate the data as returned by new ROD system, and the corresponding tests (MD5 sums) have to be modified as well since some data as seen through the new ROD system is indeed different.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1668 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 16:58:56 +00:00
asivache
a6bd509593
Changing the carpet under your feet!! New incremental update to th eROD system has arrived.
...
all the updated classes now make use of new SeekableRodIterator instead of RODIterator. RODIterator class deleted. This batch makes only trivial updates to tests dictated by the change in the ROD system interface. Few less trivial updates to follow. This is a partial commit; a few walkers also still need to be updated, hold on...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1667 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 16:55:22 +00:00
asivache
4c67a49ccb
Removed unused imports
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1666 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 16:45:22 +00:00
hanna
e7f44ada98
Make unpackList public static so that Doug can use it in the scatter/gather framework.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1665 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 15:32:49 +00:00
ebanks
7b627fd622
Check for empty interval lists to merge
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1664 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 04:34:26 +00:00
hanna
7f5778c966
Update gsadevelopers -> gsahelp.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1663 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-20 23:36:54 +00:00
aaron
3a487dd64e
little fixes; also fixed a tyPo
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1662 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 22:38:51 +00:00
aaron
b6d7d6acc6
fix for the eval tests, and a change to the backedbygenotypes interface, more changes to come
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1661 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 22:25:16 +00:00
depristo
4318f75910
tiny cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1660 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 21:04:25 +00:00
depristo
3a341b2f06
Fixes for VariantEval for genotyping mode
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1659 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 21:01:43 +00:00
aaron
7b39aa4966
Adding the VCF ROD. Also changed the VCF objects to much more user friendly.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1658 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 20:19:34 +00:00
sjia
83e6e5a3e4
Calculates Probability for each allele combination (using likelihood score and allele frequencies only)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1656 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 18:46:38 +00:00
ebanks
b19fd4d45c
Damn unit tests have a null Toolkit()...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1654 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 17:10:49 +00:00
ebanks
90626c843d
oops - we don't need reference bases, but we still need reference
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1653 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 16:24:45 +00:00
ebanks
2b2df4e1ba
- Fix the CleanedReadInjector to deal with -L intervals correctly.
...
- Some walkers don't use the ref base, so speed up traversals by not requiring it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1652 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 16:17:58 +00:00
ebanks
7da9ff2a9e
Put back the check that both chip and variant are not null.
...
Also, sanity check that ref is not 'N'.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1651 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 16:03:54 +00:00
asivache
94618044e8
Starting an update of ROD system. These basic classes will completely replace old ones, but with this update they are not linked to anything, so this checkpoint should be safe.
...
The main reason for the change is that there can be (and are!) multiple RODs overlapping with a single reference base position in a single track. There can be two "trivial" RODs at the same location (e.g. samtools pileup will have two point-like records at putative indel sites: one for the reference, the other one for the indel itself). Or there can be one or more "extended" RODs (length >1), eg. dbSNP can report an indel at Z:510-525 AND a SNP at Z:515.
The ReferenceOrderedDatum object (and children) will not be changed, but it is now explicitly interpreted as a single data *record*, possibly out of many available from a given track for the current site. As long as single data record occupies one line in a data file, the new ROD system will take care of loading and keeping multiple records, including extended (length > 1) ones, and will automatically drop the records when they finally go out of scope. For one-line-per-record, multiple-records-per-site RODs, there is no need anymore for the hack used so far that involved passing ROD's own implementation of iterator through reflection mechanism (though it will still work)
* RODRecordList:
the ROD system (its iterators) will now always return a LIST of all RODs available at current position or at current query interval (see below). This class is a trivial wrapper for a list of ROD objects, with added location argument for the whole collection. The location of the RODRecordList is where the ROD system is currently sitting at: a single, current base on the reference (if next() traversal is performed), or the location of the query interval when returned by seekForward() (see below). The ROD objects themselves will have their locations set according to the original data in the file. Hence, perusing the above example of a dbSNP indel at Z:510-525 and SNP at Z:515, when moving to the position Z:515 the ROD system will return a RODRecorList with location Z:515, and with two ROD objects packaged inside, one with location Z:510-525, the other with Z:515.
*RODRecodIterator:
Almost identical to old SimpleRODIterator used by ReferenceOrderedData; this is a low-level iterator that walks over records in the data file (with a callback to ROD's ::parseLine() to parse real data)
*SeekableRODIterator:
a decorator class that wraps around Iterator<ROD> (such as RODRecordIterator) and makes the data traversable by reference position, rather than record by record. This is reimplementation of the old RODIterator. SeekableRODIterator's ::next() moves to the next position on the ref and returns all RODs overlapping with that position (as a RODRecordList). This iterator also adds a seekForward(loc) operation, that allows fast forwarding to a specified position or interval. Length > 1 query arguments (extended intervals) are fully supported by seekForward(), the returned RODRecordList wil contain all RODs overlapping with the specified interval, and the location of the returned RODRecordList object will be set to that query interval. NOTE: it is ILLEGAL to perform next() after a seekForward() query with length > 1 interval. seekForward() with point-like (length=1) interval reenables next().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1650 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 15:58:37 +00:00
ebanks
66a4de9a1d
Genotype check should be case-insensitive
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1649 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 03:23:30 +00:00
hanna
c186a49d55
Time for a reorganization. Repackage generally useful alignment classes lower in the package structure, and create a subpackage for bwa-specific code. Repackage BWA alignment code away from BWT representation. Isolate byte- and word-packing streams in another package that will ultimately be killed off en masse.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1648 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-17 23:28:47 +00:00
hanna
b4df089b59
Putting some of the required data structures together for imperfect lookup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1647 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-17 22:43:11 +00:00
hanna
355136928e
Play nice with other jobs in this VM -- don't close stdout / stderr.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1646 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-17 18:55:08 +00:00
sjia
0e73b2ba8e
Use population allele frequencies to distinguish between top candidates
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1645 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-17 15:49:19 +00:00
chartl
534486a254
Output formatting changed:
...
- summary output now reported as a percentage rather than proportion; 2 sigfigs
- fixed minor bug where FNR was calculated over total calls rather than total variant sites
- column headers are_now_contiguous_strings
- spacing fixed
- "No Call" separated from "Ref Call" as its own column
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1644 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-17 14:00:25 +00:00
depristo
73bec6f36d
Now uses expanding array list for coverage histograms. No hard limit on maximum depth now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1643 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 23:27:25 +00:00
chartl
4ad46590a3
Changes to PooledGenotypeConcordance:
...
Additional output & better output formatting. It has now undergone a good five hours of testing; and for pools of size 1 outputs exactly the same statistics as GenotypeConcordance (when GenotypeConcordance is modified to do nothing on reference='N'); and for pools of many sizes outputs close to the expected (by genetics) statistics. Looks like this is working properly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1642 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 21:45:01 +00:00
chartl
386a6442ba
Actually deleted now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1641 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 20:28:06 +00:00
chartl
8fce376792
Changes:
...
Deletion: PooledGenotypeConcordanceNew
Rewrite: PooledGenotypeConcordance. It works, and is blazing fast compared to the earlier version (1 order of magnitude speedup)! And is now entirely non-hackey, as opposed to before when there were some hacky bits.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1640 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 20:22:16 +00:00
asivache
3e289fcaa4
A little piece that PairMaker needs in order to compile ;)
...
Iterates synchronously over two (name-ordered) single-end alignment SAM files with, possibly, multiple alignments per read and for each read name encountered returns pairs<all alignments for end1, all alignments for end2>
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1639 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 19:17:40 +00:00
asivache
2f29cf59ba
Very early, half-baked version. All it can do right now is to take two SAM files with end1 and end2 individual single-end alignmnets from a pair-end run and spit out a "paired" BAM file that contains ONLY properly paired ends (both ends align uniquely && both ends align to the same chromosome && the ends align in proper orientation). Insert size is currently not used (and not set in the output). Unpaired/unmapped reads are NOT transferred into the output bam. For the pairs that do get written, the output is (should be) standard-conforming: all flags are properly set and mate pair information is correct.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1637 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 18:38:18 +00:00
ebanks
5d85bd9671
By default, VF should ask for deleted bases so that they show up in coverage.
...
The Strand filter then needs to ignore those bases when determining bias.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1636 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 16:46:09 +00:00
ebanks
a7c306f757
-deal with offsets that can be -1
...
-added option to have "D"s inserted for deleted bases in pileup strings
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1635 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 16:44:57 +00:00
hanna
01a9b1c63b
Fix for problem where err stream remapped to output stream in certain cases, (hopefully) completing Matt's hat trick of fail. Thanks, unit tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1634 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 08:33:56 +00:00
aaron
eedf55e94d
temp fix for a broken test, we'll fix the test tomorrow. We promise, we're engineers, we love our tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1633 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 04:36:42 +00:00
chartl
f6bdb47bb6
Addition:
...
@PooledGenotypeConcordanceNew - a new version of the pooled genotype concordance test for Variant Eval. Code altered to be more extensible, use a private class for handling the count tables so it doesn't gunk up the code in the test itself, and for easy debugging. The hackier methods from the original were rewritten properly. Currently computes more statistics that it outputs. Code compiles, is never called by anything, and breaks none of the tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1632 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 04:14:58 +00:00
aaron
542d817688
more cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1631 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 21:42:03 +00:00
hanna
9f7cf73411
Output stream management fixes. I completely screwed up the output stream management system, but cleverly masked this fact by breaking some other stream management functionality that masked the problem.
...
Sigh.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1630 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 21:06:45 +00:00
hanna
17758b381c
Properly initialize redirected output streams in case of out and err.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1629 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 19:47:43 +00:00
andrewk
00dfe014b7
Added option to FastaReferenceWalker to change output FASTA file format's line width and to remove header lines; allows dumping raw sequence using intervals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1628 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 18:00:30 +00:00
hanna
b69eb208a6
Always create output files, even if no output was written to them.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1627 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 17:58:14 +00:00
aaron
b401929e41
incremental clean-up and changes for VariantEval, moved DiploidGenotype to a better home, and fixed a spelling error.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1624 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 04:48:42 +00:00
ebanks
6783fda42a
Updated unit test to reflect changes to vcf output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1623 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 01:56:08 +00:00
andrewk
fb254759cb
Trivial: Don't print reduce result
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1621 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 23:42:20 +00:00
hanna
118071cfd8
Proof-of-concept perfect read aligner, implemented as described in sec 2.4 of BWA paper. Has successfully aligned a handful of reads. Requires significant cleanup and refactoring.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1617 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 21:54:56 +00:00
ebanks
01e7b39c8d
1. Don't print out values in filter field of the VCF.
...
2. Fix ratio printouts (for params file)
3. Rename ratio filter's get counts method to avoid confusion; more changes on the way this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1616 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 21:03:39 +00:00
ebanks
436f543b3b
I owe Doug a beer for finding this:
...
don't print out intervals to be merged if they're not within the global -L intervals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1615 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 20:22:30 +00:00
chartl
7d6d114ab5
Additions:
...
@NQSMismatchCovariantWalker - Walks along the gene calculating the table
# NQS
# Q score
# mismatches at non-dbsnp sites
# total number of bases at non-dbsnp sites
And prints it out at the end.
Changes:
@PooledGenotypeConcordance now works. Takes a path to a file listing a bunch of hapmap IDs in whatever pool we want to check, reads those in, and checks for concordance by name.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1614 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 20:12:04 +00:00
sjia
9be1832d7b
Phasing version 1
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1613 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 16:10:37 +00:00
asivache
a009592662
the life in the magical kingdom of fully spec-conforming SAM files would be so... magical. For now, however, there are plenty of ways to end up with inconsistent SAM records. For instance, a SAM file with missing header will result in SAM records with ref. name set, but getReferenceIndex() returning null. This, in turn, was tripping isReadUnmapped(). The method is now fixed, so that it suffices to have *either* reference name *or* reference index set for the read to be considered mapped (the flag is still checked)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1612 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 16:04:19 +00:00
aaron
e03fccb223
Changes to switch Variant Eval over to the new Variation system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1611 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 05:34:33 +00:00
aaron
5b41ef5f70
rod DBSNP had a bug where the reference wasn't calculated correctly under certain conditions. Fixed getRefBasesFWD and getRefSnpFWD so that they were more in line with getAltBasesFWD and getAltSnpFWD. Also updated Variant Eval tests to reflect this change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1609 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-13 23:48:58 +00:00
chartl
5cf1d6c104
Bugfix - this walker was never changed to work with the new PoolUtils methods after those methods were changed to return ReadOffsetQuad objects rather than nested pairs. This broke the build :(.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1608 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-13 19:39:23 +00:00
ebanks
c669e8d5ad
Use constant seed in the random generator so we can be stable (and thus unit tests will work)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1607 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-13 17:40:56 +00:00
ebanks
15178977e1
Naive tool to convert from vcf to geli text
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1606 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-13 17:25:02 +00:00
chartl
794bd26b20
Changed some ShortNames so they made more sense.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1604 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-13 01:32:12 +00:00
chartl
b353bd6f81
Added a Quad toString() method.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1603 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-13 01:13:57 +00:00
chartl
2e237a12e9
This commit has a bunch to do with cleaning up the CoverageAndPowerWalker code: implementing some new printing options,
...
but mostly altering the code so it's much more readable and understandable, and much less hacky-looking.
ADDED:
@Quad: This is just like Pair, except with four fields. In the original CoverageAndPowerWalker I often used
a pair of pairs to hold things, which made the code nigh unreadable.
@SQuad: An extension of Quad for when you want to store objects of the same type. Let's you simply declare
new SQuad<X> rather than new Quad<X,X,X,X>
@ReadOffsetQuad: An extension of Quad specifically for holding two lists of reads and two lists of offsets
Supports construction from AlignmentContexts and conversion to AlignmentContexts (given
a GenomeLoc). There are methods that make it very clear what the code is doing (getSecondRead()
rather than the cryptic getThird() )
@PowerAndCoverageWalker: The new version of CoverageAndPowerWalker. If the tests all go well, then I'll remove
the old version. New to this version is the ability to give an output file directly
to the walker, so that locus information prints to the file, while the final reduce
prints to standard out. Bootstrap iterations are now a command line argument rather
than a final int; and users can instruct the walker to print out the coverage/power
statistics for both the original reads, and those reads whose quality score exceeds
a user-defined threshold.
CHANGES:
@PoolUtils: Altered methods to accept as argumetns, and return, Quad objects. Added a random partition method
for bootstrapping.
@CoverageAndPowerWalker: Altered methods to work with the new PoolUtils methods.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1602 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-13 01:00:04 +00:00
depristo
6c7a300664
Missing file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1601 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:17:09 +00:00
depristo
6e13a36059
Framework for ROD walkers -- totally experiment and not working right now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1600 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:13:15 +00:00
depristo
bd75a8d168
Unused code has been removed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1599 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:12:23 +00:00
depristo
e8d544869d
Alignment context now supports the idea of skipped bases -- not currently in use
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1598 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:11:38 +00:00
depristo
3ad97e4ab4
Easier to print GenomeLoc compareTo()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1597 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:10:35 +00:00
depristo
3949b4ac72
commented out version of next() and hasNext() that appear to be correct but are causing testing problems
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1596 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:09:21 +00:00
depristo
58105636c8
getBoundRods() convenience method
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1595 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:07:57 +00:00
depristo
4e1eded389
Fixed bad compareTo operator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1594 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:07:10 +00:00
depristo
17ab1d8b25
General purpose merging iterator implementation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1593 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:06:15 +00:00
hanna
275707f5f6
Data structure for counts, to isolate the user from wonky 'sometimes counts are cumulative, other times base-by-base' gotchas.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1592 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 20:53:24 +00:00
depristo
7c8b17b456
fix for SSG with pl name
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1591 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 20:39:34 +00:00
andrewk
5354c1876c
De Novo SNP caller as presented at 1KG meeting on 9/10/09 with min LOD 5 calls required from both parents and a LOD 5 call in the daugter gold standard concordant call set. All SNP calls must be present as bound RODs.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1590 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 19:30:23 +00:00
hanna
0f3049652a
Start to build BWT abstractions, so we can present a reasonable facsimile of the BWT to the user no matter how it's represented on disk.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1589 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 18:23:15 +00:00
chartl
c3f77acd5e
Alteration to CoverageAndPowerWalker. It can now be flagged with -uc which will cause it to print not only the coverage on each strand that exceeds the quality score threshold, but also the total coverage on each strand as well.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1588 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 17:55:44 +00:00
chartl
d6a0b65ac9
Changes:
...
Rollback of Variant-related changes of r1585, additional PGC code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1586 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 16:23:01 +00:00
chartl
0c54aba92a
Changes:
...
@VariantEvalWalker - added a command line option to input a file path to a pooled call file for pooled genotype concordance checking. This string is to be passed to the PooledGenotypeConcordance object.
@AllelicVariant - added a method isPooled() to distinguish pooled AllelicVariants from unpooled ones.
@ all the rest - implemented isPooled(); for everything other than PooledEMSNProd it simply returns false, for PooledEMSNProd it returns true.
Added:
@PooledGenotypeConcordance - takes in a filepath to a pool file with the names of hapmap individuals for concordance checking with pooled calls
and does said concordance checking over all pools. Commented out as all the methods are as yet unwritten.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1585 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 15:01:50 +00:00
ebanks
e24c8d00d5
So, the VCF spec allows for an optional meta field in the header representing the date. However, using this field means that integration tests run on the vcf file will fail the MD5 test (which is what happened to the VariantFiltration test this morning after working just fine yesterday).
...
After consulting our resident expert (Aaron), we're going to (temporarily) remove the date from the vcf output until we can come up with a better solution. However, this shouldn't cause any short-term problems because the data truly is optional.
VF test's MD5s are updated.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1580 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 14:28:43 +00:00
aaron
296878e8e3
adding a basic implementation of the Variation interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1578 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 04:41:13 +00:00
aaron
5a64a80ab5
changes to the variation class, updates to SSG, updated tests based on changes to the SSGenotypeCall, and added the ability to run a single integration test from using the build script.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1577 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 04:31:33 +00:00
depristo
c988205884
Notes for Aaron in SSG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1576 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 03:18:51 +00:00
ebanks
1362a56227
Added fasta tests and small fix to cleaner test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1575 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 03:13:11 +00:00
ebanks
8ca89279aa
Added a test for VariantFiltration and the VECs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1574 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 02:21:14 +00:00
hanna
6de54dcd2a
Higher-level readers and writers for BWTs and suffix arrays.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1573 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 22:45:32 +00:00
depristo
0093482c62
N reference base fix for SSG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1572 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 21:19:36 +00:00
hanna
bc9fe31cf5
Cleanup of int-packed file readers / writers. All primitive writers for BWTs and SAs are in place; time to move on to compound reader / writers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1571 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 20:36:39 +00:00
asivache
d9f3e9493f
Does not return 0-length cigar elements anymore (used to do so when previous cigar element ended exactly at the segment boundary)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1570 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 20:05:55 +00:00
ebanks
cb31d5a0ab
VariantFiltration now outputs VCF. Important changes:
...
1. VariantsToVCF can now be called statically to output VCF for a single ROD instance; this is temporary until we have a VCF ROD.
2. VariantFiltration now outputs only 2 files, both mandatory: all variants that pass filters in geli text, and all variants in VCF.
If there are any problems, go find Aaron.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1569 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 20:04:32 +00:00
asivache
dd0085c428
1) now is tolerant to sloppy cigar strings with 0-length elements (at the price of extra recursive call)
...
2) when reads with deletions are requested, adds to the pile just those: reads with 'D' over the current reference base, but not 'N'
3) next() now implements a loop: recursive forward iteration calls to next() until ref. position with non-zero coverage is encountered were OK for (short) deletions, but with long stretches of N's they end up with stack overflow
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1568 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 20:04:04 +00:00
ebanks
542af6402e
output correct format for Sequenom SNPs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1567 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 19:21:53 +00:00
hanna
43d1c6741c
Cleanup. Separate common packing functionality into utils class. Make base packing utility as generic as possible.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1566 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 17:54:12 +00:00
kiran
3b1e966b4c
Lowercases the sequencing platform so that a difference in case doesn't lead to the failure to look up an entry in the hash.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1565 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 17:35:45 +00:00
kiran
d82d6c0665
Excludes variants that fall below a certain LOD that changes as a function of depth.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1564 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 17:34:16 +00:00
kiran
06eae52292
Throws an exception if you attempt to use a filter that doesn't exist.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1563 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 17:33:27 +00:00
asivache
1060b36288
Bug fix: 'N' cigar elements now treated properly; for all practical intents and purposes, N is the same as D and should be treated as such, the difference is only in logical interpretation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1562 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 17:08:35 +00:00
ebanks
bed646e4f6
Adding cleaner test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1561 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 16:05:56 +00:00
chartl
9c7f456510
Changed the short name on the PoolSize cmd line argument
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1560 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:53:22 +00:00
chartl
9d69bd2c84
Modifications:
...
@CoverageAndPowerWalker - removed a hanging colon that was being printed after the reference position
@VariantEvalWalker - added a command line argument for pool size for eventual use in doing pooled caller evaluations. As now, the variable is unused.
@AlignmentContext - altered the scope of class variables from private to protected in order that child objects might have access to them
New Additions:
Filtered Contexts
Sometimes we want to filter or partition reads by some aspect (quality score, read direction, current base, whatever) and use only those reads as
part of the alignment context. Prior to this I've been doing the split externally and creating a new AlignmentContext object. This new approach makes
it a bit easier, as each of these objects are children of AlignmentContext, and can be instantiated from a "raw" AlignmentContext.
@FilteredAlignmentContext is an abstract class that defines the behavior. The abstract method 'filter' is called on the input AlignmentContext, filtering
those reads and offsets by whatever you can think of. The filtered reads/offsets are then maintained in the reads and offsets fields. These classes can
be passed around as AlignmentContexts themselves. Writing a new kind of read-filtered alignment context boils down to implementing the filter method.
@ReverseReadsContext - a FilteredAlignmentContext that takes only reads in the reverse direction
@ForwardReadsContext - a FilteredAlignmentContext that takes only reads in the forward direction
@QualityScoreThresholdContext - a FilteredAlignmentContext that takes only reads above a given quality score threshold (defaults to 22 if none provided).
A unit test bamfile and associated unit tests for these are in the works.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1559 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:49:52 +00:00
depristo
d9588e6083
bug fixes to LIBS and LIBH following ultra-aggressive regression testing across 454, solid, and solexa
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1558 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:36:12 +00:00
asivache
0721c450c2
Bug fix: single unmapped read now keeps mapping qual 0 after remapping, not 37!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1557 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:29:34 +00:00
asivache
df11618092
Set default value of useLocusIteratorByHanger to FALSE. Otherwise the -LIBH flag is useless and there'd be no wayto "unset" the 'true' value. Old version was (always) using LocusIteratorByHanger. Now default iterator is indeed LocusIteratorByState, and -LIBH will switch back to the old one
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1556 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:09:09 +00:00
aaron
0df6a9da5c
-Seperating out normal (unit) tests and integration tests. From now on if your test are more of an integration test (i.e. you're testing a walker and all the subunits it relies on) please name the test "______IntegrationTest.java" instead of "______Test.java".
...
-Bamboo will now run the integration tests once a day, and the normal units tests on each check-in.
-Also added a bunch of unit tests for VariantEval walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1555 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:01:40 +00:00
depristo
eeb9b6eb13
GenotypeLikelhoods now support a cache per subclass, avoiding genotyping clashes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1554 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 10:39:14 +00:00
ebanks
0cc219c0df
-Added unit test for walkers dealing with intervals for cleaning
...
-I also uncovered a corner case in the cleaner that for some reason was commented out but shouldn't have been. Hooray for unit tests!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1553 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 02:35:17 +00:00
depristo
ec0f6f23c7
LocusIterationByState is now the system deafult. Fixed Aaron's build problem
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1552 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 01:28:05 +00:00
aaron
ea6ffd3796
initial VariantEvalWalker test. More to be added soon...
...
Also fixed the case where MD5 sums had leading zero's clipped off
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1551 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 01:02:04 +00:00
hanna
adce3bd536
My reference implementation is now generating a BWT which matches BWT-SW's.
...
Note to self: never give project status in an svn log.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1550 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-08 22:11:03 +00:00
hanna
f22f590192
Successfully writing .sa files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1549 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-08 17:34:34 +00:00
sjia
600c234643
Starting code on phasing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1548 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-08 15:20:38 +00:00
aaron
3276e01e5f
fixing the build
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1546 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-08 13:13:55 +00:00
kiran
f963cfcb21
Made enum listing header fields public.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1545 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-08 06:12:59 +00:00
kiran
fd20f5c2e8
For a file or files backed by a ROD implementing AllelicVariant, outputs a VCF file summarizing the information. Metadata like Hapmap and dbSNP membership, genotype LOD, read depth, etc, are annotated appropriately. The results output by this program are equivalent to those given by Gelis2PopSNPs.py.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1544 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-08 06:12:18 +00:00
ebanks
4a95f2181d
print out the right variant
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1543 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-08 01:37:35 +00:00
sjia
5791da17ae
Updated to reference HLA database of unique 4 digit alleles
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1542 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-07 22:12:56 +00:00
ebanks
5dbba6711c
Lots of changes: (I'll send email out in a sec)
...
1) Moved various disparate concordance / set splitting functionalities to a new parent tool which works like VariantFiltration (i.e. people can write various modules that fit inside and can be run though it).
2) Fixed up argument parsing in VariantFiltration to use key=value format so we don't accidentally mox up values (like I had been doing).
3) Have indel rod print samples
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1540 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-07 01:12:09 +00:00
depristo
1c3d67f0f3
Improvements to the CountCovariates and TableRecablirator, as well as regression tests for SLX and 454 data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1539 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 22:26:57 +00:00
depristo
2b0d1c52b2
General WalkerTest framework. Includes some minor changes to GATK core to enable creation of true command-line like GATK modules in the code. Extensive first-pass tests for SSG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1538 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 19:13:37 +00:00
sjia
471ca8201e
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1537 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 19:12:46 +00:00
aaron
0cc634ed5d
-Renamed rodVariants to RodGeliText
...
-Remove KGenomesSNPROD
-Remove rodFLT
-Renamed rodGFF to RodGenotypeChipAsGFF
-Fixed a problem in SSGenotypeCall
-Added basic SSGenotype Test class
-Make VCFHeader constructors public
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1536 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 18:40:43 +00:00
ebanks
fd1c72c151
Fixed package name
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1535 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 15:40:06 +00:00
ebanks
6c476514f8
Moved to core. Wiki pages are going up; unit tests will be written soon.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1533 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 15:09:11 +00:00
ebanks
42c71b4382
Fix for Kris: now SNPs aren't masked by default (only when they come from a mask rod) and we can design Sequenom validation assays for them.
...
I'll move this all to core in a bit...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1532 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 14:52:06 +00:00
ebanks
849dce799d
This rod was all wrong for generating the alternate snp alleles (it returned null or even the wrong value); fixed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1531 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 14:21:46 +00:00
depristo
a08c68362e
Renaming error to getNegLog10PError(); added Cached clearing method to GL; SSG now has a CallResult that counts calls; No more Adding class to System.out, now to logger.info; First major testing piece (and general approach too) to unit testing of a walker -- SingleSampleGenotyper now knows how many calls to make on a particular 1mb region on NA12878 for each call type and counts the number of calls *AND* the compares the geli MD5 sum to the expected one!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1530 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 12:39:06 +00:00
aaron
3c2ae55859
changes for the genotype overhaul. Lots of changes focusing on the output side, from single sample genotyper to the output file formats like GLF and geli. Of note the genotype formats are still emitting posteriors as likelihoods; this is the way we've been doing it but it may change soon.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1529 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 05:31:15 +00:00
ebanks
2241173fff
In order to help learn python, I decided to convert Michael's DoC python script to Java; the CoverageHistogram now spits out standard deviations for a good Gaussian fit.
...
This code eventually needs to end up in the VariantFiltration system - when we are ready to parameterize on the fly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1528 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 02:23:57 +00:00
chartl
544900aa99
Migration of some core calculations (log-likelihood probabilties, etc.) from CoverageAndPowerWalker into static methods in PoolUtils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1527 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 21:43:29 +00:00
chartl
93cedf4285
---------------
...
| Added items |
---------------
@/varianteval/PoolAnalysis
Interface to identify variant analyses that are pool-specific.
@/varianteval/BasicPoolVariantAnalysis
Nearly the same as BasicVariantAnalysis with the addition of a protected integer (numIndividualsInPool)
which holds the pool size. One soulcrushing change is that "protected String filename" needed to
become "protected String[] filename" since now multiple truth files may be looked at. It was tempting
to make the change in BasicVariantAnalysis with some default methods that would maintain usability of
the remainder of the VariantAnalysis objects, but I decided to hold off. We can always merge these
together later.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1526 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 21:26:04 +00:00
sjia
ee06c7f29f
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1525 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 19:41:12 +00:00
sjia
043c97eede
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1524 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 19:34:42 +00:00
aaron
c849282e44
reverting the HLA walker changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1523 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 19:11:57 +00:00
asivache
5202d959bf
NM attribute changed in sam jdk (?) from Integer to Short, or maybe it is presented differently by the reader depending on whether SAM or BAM is processed; in any case, both Integer and Short are safe now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1522 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 19:03:32 +00:00
sjia
ada4c5a13c
Small change to debug printing code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1521 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 18:31:21 +00:00
kiran
c3aaca1262
Improvements to make this work with uncompressed fastq files. Pulled the fastq parser out into it's own SAMFileReader-like entity.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1520 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 17:20:16 +00:00
asivache
499b3536a4
Changed to use AlignmentUtils.isReadUnmapped() for better consistency with SAM spec; also, it is now explicitly enforced that unmapped reads have <NO_...> values set for ref contig and start upon "remapping"
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1519 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 16:45:07 +00:00
ebanks
5bd99fc1c4
VariantFiltration moved to core.
...
Another win for the team.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1517 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 15:41:41 +00:00
chartl
5130ca9b94
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1516 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 15:17:02 +00:00
depristo
bdd0a6f9fa
change to make build work
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1511 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 13:43:10 +00:00
depristo
b01ac9de0c
High performance LocusIterator implementation. Now with greatly reduced memory impact and 2x (and more potentially) speed ups of raw locus iteration. General performance improvements to SSG with empirical probs. You can enable high-performance locus iteration with the -LIBS arg. It's still testing but passes validing pileup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1510 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 03:06:25 +00:00
jmaguire
e2780c17af
Checkin of the Multi-Sample SNP caller.
...
Doesn't work yet; same command I used to use now causes GATK to throw an exception.
Will check with Matt & Aaron tomorrow, then do a regression test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1509 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 00:23:28 +00:00
hanna
e2a79c5cd9
Checkpoint. The BWT that we generate now matches the first 16% of the BWT that BWT-SW generates. Cleaned up output streams to separate the byte packing / word packing from the data structure generation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1508 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 22:18:17 +00:00
ebanks
3dfc77dc89
Add an indel rod which represents the initial point of the indel only
...
(useful for alternate reference making)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1507 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 19:32:29 +00:00
asivache
58debd7e56
A convenience shortcut isReadUnmapped() added: thanks to SAM format specification, 'read unmapped' flag is not always required to be set for an unmapped read; this method checks both the flag and the alignment reference index/start (if those are set to '*' the flag is not required according to the spec!)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1506 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 17:00:39 +00:00
aaron
0e6feff8f2
fixed locus pile-up limiting problem
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1505 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 16:56:44 +00:00
hanna
d8aff9a925
Bug fixes. Was ignoring the '$' character in a few places where I shouldn't have been.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1504 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 16:27:31 +00:00
ebanks
55013eff78
Re-revert back to point estimation for now. We need to do this right, just not yet.
...
Also, it's safer to let colt do the log factorial calculations for us.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1503 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 15:33:18 +00:00
hanna
1ada085970
Cruddy implementation of BWT creation, for understanding and testing purposes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1501 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 02:16:56 +00:00
ebanks
24d809133d
Oops - comment out the printouts
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1500 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 01:45:56 +00:00
ebanks
91ccb0f8c5
Revert to having these filters use integration over binomial probs
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1499 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 01:40:22 +00:00
aaron
05c164ec69
changing the default behavior to allow any sized read pile-up (which may exceed the memory limit); the user can then select their own read limit. The default of 100K was arbitrary.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1498 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-01 14:46:00 +00:00
ebanks
54c0b6c430
Allow this ROD to consist of just the positions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1497 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-01 12:43:18 +00:00
aaron
4a1d79cd7b
added a flag, maximum_reads_at_locus, shortName "mrl", which limits the number of reads we add to the locusByHanger. In some bam files misalignment produces pile-ups of 750K or more reads. We now limit this to the default of 100K reads.
...
The user is warned if a locus exceeds this threshold, and no more reads are added.
Also CombineDup walker had an incorrect package name.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1496 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-01 04:21:58 +00:00
ebanks
0addae967a
IndelArtifact filter can now handle filtering false SNPs that occur within the span of an indel but after the first position
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1495 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-01 03:34:39 +00:00
hanna
85ca68fab6
Initial version: creates a packed file from a fasta, suitable for consumption by BWT-SW. Works with E coli fasta, but will not work at this moment with multi-chr fastas.
...
Will be made into a utility routine when BWA comes together.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1494 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 18:39:19 +00:00
asivache
591f8eedbb
Added setName() and getName() (however, not used anywhere yet). Now can set the name of the fasta record manually to whatever, however it will work only if done early enough. If the fasta record already started printing itself (i.e. the header line is already done), setName() will throw an exception. Could be too entangled, may reverse this back...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1493 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 18:09:55 +00:00
asivache
c9eb193c7f
Now recognizes a special name for a bound rod track: snpmask. If a rod with this name is bound, then ONLY snps from that track will be used (to set alt reference bases to N's), but indels will be ignored. This helps when an alt. ref has to be created for a set of indel calls, and another rod (e.g. dbSNP) is used to put N's in (for sequenom). If dbSNP rod is not marked as "snpmask", the indels reported there will make their way into the alt. reference output and mess it up.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1492 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 18:05:57 +00:00
ebanks
8e3c3324fa
Added filter for SNPs cleaned out by the realigner.
...
It uses the realigner output for filtering; in addition, dbsnp indels partially work; IndelGenotyper calls don't yet work.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1489 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 04:32:32 +00:00
ebanks
8bc7afe781
Smarter SW penalties
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1488 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 04:29:19 +00:00
ebanks
463f80c03e
Require each filter or feature to declare whether or not they want mapping quality zero reads in the alignment context
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1487 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 03:37:24 +00:00
ebanks
1a299dd459
Require each filter or feature to declare whether or not they want mapping quality zero reads in the alignment context
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1486 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 03:31:37 +00:00
ebanks
e70101febc
Add a VEC filter for clustered SNP calls that takes advantage of the new windowed approach; delete the old standalone walker.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1485 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 03:14:42 +00:00
ebanks
215e908a11
Reworking of the VariantFiltration system to allow for a windowed view of variants and inclusion of more data to the various filters.
...
This now allows us to incorporate both the clustered SNP filter and a SNP-near-indels filter, which otherwise wasn't possible.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1484 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 02:16:39 +00:00
depristo
813a4e838f
Removing old code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1482 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-30 19:27:11 +00:00
depristo
49a7babb2c
Better organization of Genotype likelihood calculations. NewHotness is now just GenotypeLikelihoods. There are 1, 3, and empirical base error models available as subclasses, along with a simple way to make this (see the factory).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1481 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-30 19:16:30 +00:00
depristo
522e4a77ae
Caching support across multiple technologies
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1480 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-30 18:10:14 +00:00
depristo
5af4bb628b
Intermediate checking before code reorganization. Full blown support for empirical transition probs in SSG for all platforms. Support for defaultPlatform arg in SSG. Renaming classes for final cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1479 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-30 17:34:43 +00:00
depristo
6ab9ddf9f5
Significant output formatting improvements. SNPs as indels analysis. heterozygosity rate calculations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1478 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-29 21:49:09 +00:00
depristo
bde67428fd
Better formatting of the code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1477 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-29 21:46:47 +00:00
aaron
8331c195fb
changed the full name of maximum_reads to maximum_iterations for consistancy
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1475 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 16:03:46 +00:00
depristo
8e129d76fd
Support for original quality scores OQ flag. pQ flag in TableRecalibation to preserve quality scores below a threshold (defaulting to 5)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1474 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 14:14:21 +00:00
depristo
f0179109fa
Removing min confidence for on/off genotype
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1473 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 01:04:13 +00:00
depristo
4f7ed69242
toString() implemented
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1472 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 01:03:58 +00:00
depristo
dc9d40eb9a
Now requires a minimum genotype LOD before applying tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1471 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 00:19:23 +00:00
depristo
37a9b84276
corresponding test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1470 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 00:17:42 +00:00
depristo
bf60980653
Experitmental support for empirical P(B_true | B_miscall). --useEmpiricalTransitions flag to SSG enables this support. Much better implementation of Genotype likelihoods -- the system should scream along now. Continuing progress towards deleting old model
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1469 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 00:17:24 +00:00
depristo
7cf9a54b64
change for new char/byte in BaseUtils
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1467 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-27 23:47:56 +00:00
depristo
a639459112
Trival consistency change from char in to char out, not char in to byte out
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1466 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-27 23:37:37 +00:00
chartl
6012f7602b
@ minor fixes to CoverageAndPowerWalker and AnalyzePowerWalker (switching to By Reference traversal, spitting out Syzygy position for sanity check)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1465 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-27 21:44:18 +00:00
chartl
bd1e679bc5
@ Fixed issues with AnalyzePowerWalker which depended on CoverageAndPowerWalker. The latter was changed but not the former. Now fixed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1464 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-27 20:23:41 +00:00
kiran
a17dad5fa9
Converts from fastq.gz to unaligned BAM format. Accepts a single fastq (for single-end run) or two fastqs (for paired-end run). Also allows you to set certain BAM metadata (read groups, etc.).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1463 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-27 20:20:09 +00:00
chartl
8740124cda
@ListUtils - Bugfix in getQScoreOrderStatistic: method would attempt to access an empty list fed into it. Now it checks for null pointers and returns 0.
...
@MathUtils - added a new method: cumBinomialProbLog which calculates a cumulant from any start point to any end point using the BinomProbabilityLog calculation.
@PoolUtils - added a new utility class specifically for items related to pooled sequencing. A major part of the power calculation is now to calculate powers
independently by read direction. The only method in this class (currently) takes your reads and offsets, and splits them into two groups
by read direction.
@CoverageAndPowerWalker - completely rewritten to split coverage, median qualities, and power by read direction. Makes use of cumBinomialProbLog rather than
doing that calculation within the object itself.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1462 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-27 19:31:53 +00:00
chartl
1da45cffb3
New:
...
Minor changes to CoverageAndPowerWalker bootstrapping (faster selection of indeces).
Entirely new Aritifical Pool Walker (ArtificialPoolWalkerMk2), will likely replace ArtificialPoolWalker on the next commit. Adapted the method of sampling, and added a helper context class: ArtificialPoolContext which carries much of the burden of calculation and data handling for the walker. The walker itself maps and reduces ArtificialPoolContexts.
Cheers!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1461 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-26 21:42:35 +00:00
chartl
92ea947c33
Added binomProbabilityLog(int k, int n, double p) to MathUtils:
...
binomialProbabilityLog uses a log-space calculation of the
binomial pmf to avoid the coefficient blowing up and thus
returning Infinity or NaN (or in some very strange cases
-Infinity). The log calculation compares very well, it seems
with our current method. It's in MathUtils but could stand
testing against rigorous truth data before becoming standard.
Added median calculator functions to ListUtils
getQScoreMedian is a new utility I wrote that given reads and
offsets will find the median Q score. While I was at it, I wrote
a similar method, getMedian, which will return the median of any
list of Comparables, independent of initial order. These are in
ListUtils.
Added a new poolseq directory and three walkers
CoverageAndPowerWalker is built on top of the PrintCoverage walker
and prints out the power to detect a mutant allele in a pool of
2*(number of individuals in the pool) alleles. It can be flagged
either to do this by boostrapping, or by pure math with a
probability of error based on the median Q-score. This walker
compiles, runs, and gives quite reasonable outputs that compare
visually well to the power calculation computed by Syzygy.
ArtificialPoolWalker is designed to take multiple single-sample
.bam files and create a (random) artificial pool. The coverage of
that pool is a user-defined proportion of the total coverage over
all of the input files. The output is not only a new .bam file,
but also an auxiliary file that has for each locus, the genotype
of the individuals, the confidence of that call, and that person's
representation in the artificial pool .bam at that locus. This
walker compiles and, uhh, looks pretty. Needs some testing.
AnalyzePowerWalker extends CoverageAndPowerWalker so that it can read previous power
calcuations (e.g. from Syzygy) and print them to the output file as well for direct
downstream comparisons.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1460 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-25 21:27:50 +00:00
kiran
478f426727
Fixed a missing method implementation in these two files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1459 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-25 21:21:58 +00:00
kiran
f12ea3a27e
Added ability for all filters to return a probability for a given variant - interpreted as the probability that the given variant should be included in the final set. The joint probability of all the filters is computed to determine whether a variant should stay or go. At the moment, this is only visible in verbose mode (specify -V). Also removed 'learning mode'; now, filters emit important stats no matter what. Various code cleanups.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1458 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-25 21:17:56 +00:00
hanna
e5115409fa
Force columnSpacing to be at least one. We need a general-purpose, working tool for outputting columnar data to a PrintStream; will add JIRA.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1457 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-25 19:54:54 +00:00
aaron
811503d67b
vcf changes from Richards comments, fixed a test case
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1456 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-25 14:32:16 +00:00
hanna
ccdb4a0313
General-purpose management of output streams.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1454 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-23 00:56:02 +00:00
aaron
b316abd20f
catch a malformed column header name more gracefully
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1453 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-21 21:05:28 +00:00
aaron
0364f8e989
added the ability of the VCFReader to take in compressed gzipped files natively, which is really useful for the validator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1452 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-21 18:40:38 +00:00
aaron
647a367680
Made the size zero interval file checker emit a warnUser if we're not in unsafe mode.
...
Also changed the default logger level from error to warn. Does anyone object? It makes sense for users to always get their warn user statements in the default logging level.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1451 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-21 14:40:57 +00:00
aaron
df9133c90b
the doc on File.length states it returns 0L if it doesn't exist, added a check to make sure it exists (and length < 1)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1450 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-21 05:55:17 +00:00
aaron
cd711d7697
Added detection of interval files with zero length to the GATK, and removed it from the interval merger walker: this was a critical blocking emergency issue for Eric.
...
also fixed some verbage in the GAEngine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1449 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-21 05:35:49 +00:00
asivache
0bdecd8651
A most stupid bug. In cases when more than one indel variant was present in cleaned bam file, the "consensus" (max. # of occurences) call was computed incorrectly, and most of the times the call itself was not made at all. Fortunately, the locations where we see multiple indels are a minority, and many of them are suspicious anyway (manifestation of alignment problems?). Could change results of POOLED calls though.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1448 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-20 22:31:44 +00:00
aaron
6313c465fb
we want the RMS of the reads qualities not the RMS of the RMS of the read qualities.
...
Also the VCF version tag seems to be standardized as VCR. Updated the VCF code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1447 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-20 21:56:29 +00:00
kcibul
6c0adc9145
resuse fasta file reader
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1446 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-20 16:01:58 +00:00
aaron
0386e110cf
some documentation changes, add a couple of simple checks
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1445 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-20 05:20:27 +00:00
ebanks
10c98c418b
Walker to determine the concordance of 2 genotype call sets.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1443 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-20 01:32:44 +00:00
ebanks
1d74143ef4
A convenience argument - for Mark - so that you don't have to specify all the output file names
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1442 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-20 00:49:12 +00:00
aaron
5725de56dc
fixes in VCF, some changes to get it ready to move out of the GATK
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1441 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-19 23:31:03 +00:00
aaron
0b927f44fa
created a better seperation between instantiation of an VCF object and the object itself
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1440 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-19 20:32:50 +00:00
ebanks
ed8c92a12a
make isReference do the right thing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1439 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-19 20:32:29 +00:00
hanna
21091b9839
Fix for invalid format error when outputting BAM files.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1438 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-19 19:42:39 +00:00
aaron
4cf9110468
Adding a lot of changes to the VCF code, plus a new basic validator. Also removing an extra copy of the Artificial SAM generator that got checked in at some point.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1437 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-19 05:08:28 +00:00
ebanks
b3fe566c0c
Fix descriptions of walker args
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1436 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-18 19:46:48 +00:00
ebanks
82e2b7017e
Prevent array bounds errors
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1435 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-18 16:54:31 +00:00
ebanks
26a6f816c9
set default value for output format
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1434 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-18 16:17:09 +00:00
ebanks
53153fcd79
Allow RODs to specify that incomplete records are okay (i.e. that they allow optional fields)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1433 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-18 15:26:10 +00:00
ebanks
9b1d7921e8
added filter based on concordance to another call set
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1432 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-18 15:16:30 +00:00
ebanks
b2a18a9d61
- first pass at a basic indel filter (for now, based on size and homopolymer runs)
...
- fix simple indel rod printout
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1431 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-18 03:04:12 +00:00
ebanks
78439f7305
Modify Sequenom input format based on official documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1430 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-18 01:42:57 +00:00
aaron
63d90702d6
another iteration of the VCFReader and VCFRecord, introducing the VCFWriter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1429 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 22:17:34 +00:00
jmaguire
1e8b97b560
quietly skip empty intervals files rather than crash.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1428 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 20:19:14 +00:00
jmaguire
92c63fb530
It's just "lod" not discovery_lod now.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1427 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 18:44:09 +00:00
ebanks
df5744bcd3
update this walker so any variants can be passed in
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1426 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 16:30:39 +00:00
aaron
8403618846
the start to the VCF implementation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1425 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 04:34:15 +00:00
ebanks
d4808433a1
Added option to output the locations of indels in the alternate reference
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1424 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-16 03:46:36 +00:00
ebanks
4b6ddc55bd
Merge our 2 fastq writers into 1: incorporate Kiran's secondary-base file writer into the fasta/fastq writers
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1423 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-14 20:55:23 +00:00
ebanks
0ec581080c
Refactoring the code; also, now it prints continuously instead of potentially storing one long string.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1421 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-13 01:32:46 +00:00
asivache
2a01e71277
A very simple standalone filter for fooling around with the data: can extract only mapped or only unmapped reads, only reads with mapping quals > X, reads with average base qual > Y, reads with min base qual > Z, reads with edit distance from the ref > MIN and/or < MAX
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1420 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-12 20:28:51 +00:00
asivache
ebec0ec171
A standalone companion to BamToFastqWalker: does the same thing but without calling in gatk's heavy artillery (does not "require" a reference either). Extracts seqs and quals and places them into fastq; along the way it also reverse complements reads that align to the negative strand (so that fastq contains reads as they come from the machine).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1419 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-12 20:24:37 +00:00
asivache
112a283f54
be nice, don't forget to close the reader when done
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1418 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-12 20:19:56 +00:00
asivache
ba2a3d8a58
Reverse qualities when read seq. is reverse complemented
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1417 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-12 20:17:35 +00:00
asivache
144b424933
Added : String reverse(String s) - reverses a string
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1416 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-12 20:16:22 +00:00
ebanks
143f8eea4e
option to output in sequenom input format
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1415 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-12 16:50:37 +00:00
ebanks
7f1159b6a9
Added option to mask out SNP sites with "N"s in the new reference.
...
This is useful when producing Sequenom input files for validating indels...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1414 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-12 15:17:45 +00:00
ebanks
43f63b7530
Added a walker to convert a bam file to fastq format (including the option to re-reverse the negative strand reads).
...
Picard has such a tool but it is geared towards their pipeline and requires intimate knowledge of the lanes/flowcells,etc. This is just easy.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1413 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-12 15:10:40 +00:00
aaron
d101c20b30
added the ability to pass in a csv file of ROD triplets (one triplet per line) to the -B option
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1412 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 22:10:20 +00:00
asivache
e4acd14675
Now GenomicMap maps (and RemapAlignment outputs) regions between intervals on the master reference as 'N' cigar elements, not 'D'. 'D' is now used only for bona fide deletions.
...
Also: do not die if alignment record does not have NM tags (but mapping quality will not be recomputed after remapping/reducing for the lack of required data)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1411 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 21:10:17 +00:00
ebanks
2c3f56cb8d
fix length calculation (it was including +/- char when it shouldn't)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1410 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 20:28:24 +00:00
ebanks
5fab934f4e
- moved the reference maker to its own directory
...
- added first version of a more complicated reference maker which takes in RODs and creates an alternative reference based on the variants (indels and/or SNPs)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1409 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 18:01:06 +00:00
aaron
d69ae60b69
fixed two tests affected by my previous commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1408 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 17:57:50 +00:00
aaron
fc1c76f1d2
fixing a bug where reads in overlapping interval based locus traversals could get assigned to only one of two the regions
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1407 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 17:50:16 +00:00
sjia
1851613de4
Now using larger database of HLA alleles
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1405 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 03:11:14 +00:00
hanna
dd228880ed
Partially implemented NewHotnessGenotypeLikelihoodsTest caused the tests to fail.
...
Ouch! So hot it burned me.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1403 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 20:45:44 +00:00
asivache
3208eaabcc
A standalone picard-level tool for breaking individual reads into "pairs" of first/last N bases. Supports:
...
* splitting off only start or end of the read, or both; the output will contain
chopped sequences AND corresponding base qualities
* splitting arbitrary number of bases off each end (different numbers
for left and right segments can be specified; segments can overlap)
* splitting only unmapped reads, ignoring mapped ones
* writing splitted ends into separate sam/bam files, or into a single output file
* decorating original read names with user-specified suffixes for each end
(e.g. _1 and _2 for left and right parts of the read); default: no decoration,
original read names are used
* when mapped reads are split, the alignment cigars are chopped appropriately
and the alignment start positions are adjusted (for the right end) to correctly
specify the alignment of the selected part of the read
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1402 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 20:42:49 +00:00
asivache
36312ae4b2
tiny cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1401 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 20:26:52 +00:00
ebanks
ecae619a1b
warn user when dbSNP rod looks suspicious
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1400 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 20:20:20 +00:00
asivache
2841e151d0
javadoc comments only
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1399 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 18:44:35 +00:00
asivache
921d4f4e95
RemapAlignments is a standalone picard-level tool that does not use gatk engine; moved to 'tools'
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1396 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 15:41:07 +00:00
ebanks
02f1af0743
Don't die when a readgroup is absent from the covariates table - it could
...
happen when all reads are unmapped (or have MQ0); instead, just don't alter
the quals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1394 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 03:10:33 +00:00
depristo
089dab00e2
Was discordance rate, now concordance rate
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1393 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-07 19:37:52 +00:00
depristo
6d3ef73868
Now includes statistics on the allele agreement with dbSNP -- counts concordant calls as dbSNP = A/C and we say A/C, vs. we say A/T
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1392 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-07 19:37:07 +00:00
depristo
20baa80751
Updated polarized reference priors, need DiploidGenotypePriors class that is directly used by the NewHotness genotypelikelihoods, more bug fixes and refactoring, etc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1391 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-07 19:01:04 +00:00
depristo
a864c2f025
Updated polarized reference priors, need DiploidGenotypePriors class that is directly used by the NewHotness genotypelikelihoods, more bug fixes and refactoring, etc.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1390 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-07 19:00:06 +00:00
ebanks
db250f8d3e
Don't print if not in learning mode
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1389 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-07 06:08:02 +00:00
ebanks
4c1fa52ddf
-Added mapping quality zero filter
...
-Set some reasonable defaults (based on pilot2)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1388 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-07 03:18:02 +00:00
depristo
bbd7bec5db
Continuing cleanup of SSG. GenotypeLikelihoods now have extensive testing routines. DiploidGenotype supports het, homref, etc calculations. SSG has been cleaned up to remove old garbage functionality. Also now supports output to standard output by simply omitting varout
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1387 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 22:25:30 +00:00
sjia
d60d5aa516
Fixed bug: previously reset likelihoods after each region/exon.
...
Better comments/documentation added
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1386 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 18:44:46 +00:00
kcibul
0d47798721
made booster distance a parameter
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1385 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 18:29:21 +00:00
ebanks
3b74b3ba74
print out ref/alt ratio, not major/minor
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1384 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 16:36:25 +00:00
hanna
48713e154c
Windowed access to the reference.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1383 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 16:29:15 +00:00
depristo
65e9dcf5b7
Fully operational version of the new genotype likelihoods class. (1) Much cleaner interface. Now explicitly stores likelihoods, priors, and posteriors in separate arrays indexed by an enum, (2) no longer can be used to make calls, it relies on SSGGenotypeCall to order the likelihoods, calculate best to ref, etc, this is just for calculating genotype likelihoods now; (3) Now performs extensive error checking with validate() to ensure the system is behaving properly. (4) fixed incorrect treatment of N bases, which we being counted against everyone (5) likely found a stats bug in which heterozyosity was being applied incorrectly to the genotype priors
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1382 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 01:00:55 +00:00
depristo
4dc23f2763
Trivial formatting changes as I moved more legacy code into this system
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1381 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 00:54:26 +00:00
depristo
34af669dbb
Explicit ENUM representation of the diploid genotypes. Please use this from now on to represent strings like AA or AT
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1380 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 00:53:43 +00:00
depristo
5487ab0ee6
Added several useful routines to MathUtils for summing and bounds checking of doubles
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1379 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 00:41:31 +00:00
sjia
68309408e4
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1378 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-04 21:23:01 +00:00
sjia
45ab212f22
Post-presentation update
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1377 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-04 21:21:12 +00:00
hanna
21d1eba502
Cleaned division of responsibilities between arguments to map function. Reference has been changed
...
from an array of bases to an object (ReferenceContext), and LocusContext has been renamed to reflect
the fact that it contains contextual information only about the alignments, not the locus in general.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1376 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-04 21:01:37 +00:00