gatk-3.8

Commit Graph

Author	SHA1	Message	Date
ebanks	15bf014e0b	logger.info -> logger.debug (don't want to risk filling up my log on genome-wide calls) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1792 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 17:53:11 +00:00
ebanks	df8ea8f437	UG integration test. This was the old SSG test with MD5s updated. I'll need to add some multi-sample tests in a bit... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1791 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 17:43:58 +00:00
ebanks	008455915a	One way of making the integration test stop failing is to remove it... [waiting for Matt to cringe...] git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1789 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 17:08:41 +00:00
chartl	f89a89ffe3	Use of AlleleFrequency as an input to PowerAndCoverage is deprecated by the new walker. Reverting to the standard "power at 1 allele" calculation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1788 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 16:07:45 +00:00
chartl	ae05f5c7ad	Fixin the header. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1787 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 15:49:28 +00:00
chartl	11ff1e09b8	A new power walker for the user to feed in a number of alleles. Call that number k. Output is: Locus Power_for_k_alleles Power_for_k-2_alleles Power_for_k-2_alleles ... Power_for_1_allele This was a request from Jason Flannick & the T2DB group. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1786 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 15:35:35 +00:00
ebanks	04fe50cadd	* We no longer have a separate model for the single-sample case. * For now, a single sample input will be special-cased in the EM model - but that will change when the EM model degenerates to the single sample output with a single sample as input. For now, the EM code for multi-samples isn't finished; I'm planning on checking that in soon. The SingleSampleIntegrationTest now uses the UnifiedCaller instead of SSG, and so should all of you. More on that in a separate email. Other minor cleanups added too. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1785 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 14:08:57 +00:00
jmaguire	32128e093a	misc. changes to get the numbers back to the baseline while keeping the speedup. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1784 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 12:27:07 +00:00
jmaguire	d38a0d04b9	fix a snp mask offset error. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1783 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 12:25:40 +00:00
kiran	829e99413b	Rescores a variant after removing duplicates (defined very strictly as reads with the same start points). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1782 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 03:07:36 +00:00
hanna	fcb6a992c8	Switched IndexedFastaSequenceFile over to use memory mapping to load data rather than the loop-with-small block size. Performance improvements in loading refs are extreme; segments can be loaded in <1ms. chr1 in its entirety can be loaded in 1.5sec (down from 30sec). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1781 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 00:07:15 +00:00
jmaguire	02d2492d68	Simple tool for picking sequenom probes for SNPs. Can be extended to indels if necessary. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1780 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-07 23:46:41 +00:00
ebanks	1905b5defa	Hash by chromosome for now to reduce memory. This is a temporary solution until we decide how to reture the Injector for good. Also, with Picard's latest changes, we need to make sure we don't double-close the sam writer. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1779 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-07 20:06:25 +00:00
ebanks	f9a1598d75	Reformatting git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1778 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-07 20:03:34 +00:00
ebanks	203c626fc2	A wrapper around the GenotypeLikelihoods class for the UnifiedGenotyper. This wrapper incorporates both strand-based likelihoods and a combined likelihoods over both strands. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1777 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-07 19:57:37 +00:00
sjia	5bdcc2b4dc	Included HLA class 2 genes in CreatePedFileWalker git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1776 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-07 18:46:51 +00:00
sjia	8f896b734f	Included HLA class 2 genes in CreatePedFileWalker git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1775 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-07 18:28:01 +00:00
aaron	f9a0eefe4b	GELI_BINARY is now functional, and can be used as a variant type in SSG (-vf=GELI_BINARY). Also fixed the max mapping quality column in both GELI output formats, we haven't been correctly outputing up until now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1774 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-07 18:20:34 +00:00
chartl	225b9bccc1	Modifications to NQSClusteredZScoreWalker to output empirical mismatch rates on bins by both Z-score and reported Q-score, rather than averaging over all Q-score bins for each Z-score. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1773 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-07 13:45:12 +00:00
depristo	8dd0924b37	Minor performance improvements to VariantEval -- now all of the CPU time is spent dealing with the ROD system... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1772 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-06 23:40:30 +00:00
aaron	4554ca1b28	more cleanup, depecaited the old genotype, corrected SNPCallsFromGenotypes' imports and two other classes that depend on it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1771 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-06 19:09:27 +00:00
aaron	3aec76136f	Removing the AllelicVariant interface, which is replaced by the Variation interface. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1770 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-06 17:44:24 +00:00
depristo	1bd0c3c145	variant eval allows non Variation rod objects git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1768 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-06 13:04:26 +00:00
aaron	66fc8ea444	GSA-182: Adding support for BED interval files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1767 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-06 02:45:31 +00:00
hanna	aec83b401d	SSG multithreading doesn't play well with some I/O changes made since I last svn up'd. Reverting until I can find the reason. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1766 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-05 19:48:57 +00:00
hanna	8a503c86b6	Code supporting SSG proof-of-concept shared memory parallelism. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1765 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-05 18:56:16 +00:00
ebanks	fb619bd593	-Refactoring: make GenotypeCalculationModel constructors empty so that they don't have to be updated every time we add a new parameter; instead put that logic in the super class's initialize method (making everything protected so that only the factory can access them) -Adding initial version of Multi-sample calculation model. This still needs much work: it needs to be cleaned up and finished. Right now, it (purposely) throws a RuntimeException after completing the EM loop. Also: -Fix logic in GenotypeLikelihoods.setPriors -Add logger to the models for output git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1764 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-05 18:10:36 +00:00
sjia	98076db6b4	Modified CreatePedFileWalker to output PED file given HLA allele names git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1763 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-05 03:06:42 +00:00
hanna	56bc4fa21a	Fixed bug where not all alignments were returned if read aligned to multiple locations. Enhanced test suite to validate all alignments. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1762 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-04 18:20:20 +00:00
hanna	05aa928e3e	Fix off-by-number-of-deletions issue with negative strand reads. Improved performance by factor of 2.5x. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1761 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-03 21:55:18 +00:00
chartl	7605ee500c	Idiocy! All tests were being disabled because I forgot the instanceof git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1760 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-02 20:04:56 +00:00
chartl	88d0890cc3	Made PooledGenotypeConcordance a standard test in VariantEval git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1759 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-02 20:03:31 +00:00
aaron	7fc4472e6d	A big fix for MergingSamRecordIterator, where we weren't correctly handling the comparisons of SAMRecords correctly (we weren't applying the new reference index first, so sometimes the MT contig would be ID 23, sometimes 24 in different records). Also a fix to the GLF tests, and a correction to PrintReadsWalker to remove the close() on the output source, the source handles that itself (and you get a double close). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1758 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-02 19:35:35 +00:00
chartl	68cb2ee54b	Tweaks to parameters for NQS analysis walkers; change to PowerAndCoverage for Jason Flannick (can input the number of alleles to compute power for - i.e. doubletons, tripletons; rather than statically checking singletons. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1757 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-02 19:11:27 +00:00
ebanks	7249fade05	updated git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1756 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-02 18:10:34 +00:00
ebanks	53a4bd7f51	A better understanding of what's going on means no need for clearing the cache git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1755 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-02 18:07:46 +00:00
aaron	e885cc4b21	changes for corrected GLF likelihood output, along with better tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1754 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-01 20:45:05 +00:00
hanna	2309d19f6f	Bug fix from Michael Ross: mark second read in sequence as second of pair. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1753 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-01 14:34:36 +00:00
aaron	2e4949c4d6	Rev'ing Picard, which includes the update to get all the reads in the query region (GSA-173). With it come a bunch of fixes, including retiring the FourBaseRecaller code, and updated md5 for some walker tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1751 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-30 20:37:59 +00:00
ebanks	303972aa4b	Yup, I broke the build... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1750 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-30 20:20:43 +00:00
ebanks	841d25cc44	Added ability to set the priors after construction (and requiring a flushing of the likelihoods cache) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1749 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-30 19:55:49 +00:00
hanna	665951f9f0	Support negative strand alignments. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1748 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-30 18:10:26 +00:00
hanna	d3b1732cca	Start of refactoring effort. Make construction of alignment object simpler. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1747 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-30 15:19:31 +00:00
hanna	70e1aef550	Better integrate the @ArgumentCollection into the command-line argument parser. Walkers can now specify their own @ArgumentCollections. Also cleaned up a bit of the CommandLineProgram template method pattern to minimize duplicate code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1746 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-29 22:23:19 +00:00
aaron	b1c321f161	Adjusted Genotype concordance to more accurately use the new Genotyping code, fixed the VCF rod, and temp. fix the build by reintroducing Shermans ReadCigarFormatter git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1745 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-29 21:28:21 +00:00
sjia	9b78a789e2	HLA Caller 2.0 Walkers: CalculateBaseLikelihoodsWalker.java walks through reads calculates likelihoods using SSG at each base position CalculateAlleleLikelihoodsWalker.java walks through HLA dictionary and calculates likelihoods for allele pairs given output of CalculateBaseLikelihoodsWalker.java CalculatePhaseLikelihoodsWalker.java walks through reads and calculates likelihoods score for allele pairs given phase information File Readers: BaseLikelihoodsFileReader.java reads text file of likelihoods outputted by SSG FrequencyFileReader.java reads text file of HLA allele frequencies PolymorphicSitesFileReader.java reads text file of polymorphic sites in the HLA dictionary SAMFileReader.java reads a sam file (used to read HLA dictionary when in another walker) SimilarityFileReader.java reads a text file of how similar each read is to the closest HLA allele (used to filter misaligned reads) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1744 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-29 20:45:55 +00:00
chartl	281a77c981	Bugfix. isMismatch() was actually computing isMatch(). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1743 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-29 20:04:59 +00:00
chartl	e28b45688c	More NQS Related Walkers to play with git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1742 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-29 20:01:04 +00:00
ebanks	9ef80e3c3c	One minor addition: to incorporate Pooled calling (and to be as general as possible), we allow the genotype calculation model to use rods if it wants. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1741 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-29 17:05:59 +00:00
ebanks	19bfe43173	First pass at a unified caller, being checked in now so Mark can give feedback if he chooses and so Matt can debug issues with the ArgumentCollection class. Some notes: 1. This design should be flexible enough to include pooled calling (for now) after discussions with Chris. 2. Using the unified caller with the SingleSampleCalculationModel emits the exact same output as SSG over all of chr20 for NA12878. Additionally, when we include the "max deletions allowed at a locus" argument (so we don't try to call SNPs at deletion sites), it removed 233 SNP calls in chr20 that were clearly indel artficts. 3. The MultiSampleEMCalculationModel is still a work in progress and will be checked in later this week. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1740 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-29 16:48:15 +00:00

1 2 3 4 5 ...

1484 Commits (8461cc3a22e93dd5074690fa44942e3a75802c2f)