gatk-3.8

Commit Graph

Author	SHA1	Message	Date
rpoplin	b89b9adb2c	misc code cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2190 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 21:16:00 +00:00
depristo	e793e62fc9	minor code cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2189 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 20:57:20 +00:00
rpoplin	4969cb1957	CountCovariates uses new optimized ReadBackedPileup. It also smarter about re-doing calculations for the dnsnp variation rate sanity check. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2188 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 20:35:40 +00:00
ebanks	add2fa7ab4	more use of new ReadBackedPileup optimizations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2187 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 20:04:01 +00:00
chartl	f5fe28cc28	Another matlab script -- this time for making power and coverage plots over a specific gene region. Lots of fun file reading, string manipulation, and exploration of the set() function git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2186 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 20:02:25 +00:00
rpoplin	817e2cb8c5	Recalibrator makes use of the new GATKSAMRecord wrapper and now no longer has to hash the SAMRecord. Covariate's getValue method signature has changed to take the SAMRecord instead of the ReadHashDatum. ReadHashDatum removed completely. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2185 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 19:59:17 +00:00
ebanks	e9a8156cfb	Use new optimized ReadBackedPileup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2184 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 18:17:18 +00:00
rpoplin	d8146ab23d	Changed the format of the recalibration csv file slightly so that it is easier to load the file into something like R and look at the values of the covariates. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2183 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 17:55:23 +00:00
ebanks	a184d28ce9	Completing the optimization started by Matt: we now wrap SAMRecords and SAMReadGroupRecords with our own versions which cache oft-used variables (e.g. platform, readString, strand flag). All walkers automagically get this speedup since the wrapping occurs in the engine. I note that all integration/unit tests pass except for BaseTransitionTableCalculatorJava, which is already broken. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2182 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 17:39:29 +00:00
chartl	d5cda76c8b	A new subversion repository for matlab scripts, so they don't go away in case of catastrophic Yang failure. Might be useful if you want to learn matlab or make use of matlab scripts. It does have some advantages over R (and some disadvantages as well) This script loads data from the FHS directory listing the average DoC per exon per gene per pool and makes a gene-by-gene box plot of the resulting data, broken up between pilot and production. It will also automatically save the resulting figures if you un-comment two lines. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2181 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 16:25:07 +00:00
depristo	af22ca1b47	Bug fixes for VariantEval. dbCoverage now reports dbSNP rate, not some wierd eval_snps_in_db as before. We now separate non-indel and non-snp db sites in dbcoverage. Some dbSNP records don't fit into these two categories. Also fixed a consistency issue where novel / known sites where being determined solely by whether dbSNP had a record there, rather than the stricter dbcoverage screen for isSNP(). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2180 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 01:39:01 +00:00
depristo	2ea93385be	Better support for comparison to truth. Now emits FP rates for each covariate if a truth file is provided. Also now writes out a detailed recal.log file that can be parsed directly into R git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2179 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-29 22:20:40 +00:00
chartl	662bbbd53b	Awful stupid bug. This will use up one of my bad code offsets. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2178 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-29 20:09:33 +00:00
chartl	fa2d564f2c	And the compulsory one-second-later fix -- better handling of arguments (e.g. for callng from outside of /trunk/python/) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2177 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-29 20:02:43 +00:00
chartl	45673d7851	A quick and dirty script that, given a list of input VCF files, will output a new VCF file which looks identical to the first VCF file of the input list, except that the info field has been updated to reflect the union of all the INFO annotations across the VCF files Note: this is primarily for use with two files with mostly disjoint annotations. It views "SB=2.5" as a different info field than "SB=2.2" and so will output as info "SB=2.5;SB=2.2". That is, it compares the full field string, rather than only the field name. Usage: ./mergeVCFInfoFields I=[comma-delimited list of files] O=[output file] git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2176 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-29 20:01:29 +00:00
chartl	27651d8dc2	Oops. numReads is now called size git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2175 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-29 06:59:17 +00:00
chartl	21744e024b	Quick walker that determines % of bases covered at (user - defined depth)x . I've been maintaining it in my directories alone, but now that i've accidentally deleted it twice, into playground it goes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2174 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-29 06:51:19 +00:00
hanna	3300ca906a	An iterator for Eric to use when injecting his new wrapping reads -- a stopgap solution for getting additional caching functionality into a SAMRecord. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2173 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-27 22:25:52 +00:00
rpoplin	26db15be5c	Added SingleReadGroupFilter to only use reads from a specific read group, filtering out all others. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2172 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-27 20:33:59 +00:00
rpoplin	91f5672a32	misc cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2171 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-27 19:56:20 +00:00
rpoplin	d1298dda13	Encapsulated the sections of code that were shared by the two Recalibration walkers. This includes both the shared command line arguments and the section of code in the map methods which pull out data from the SAMRecord and stuff it into the ReadHashDatum. Command line arguments are now passed to the Covariates using a new initialize method that all Covariates must implement. Updated the dbsnp sanity check warning message to be less cryptic. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2170 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-27 19:54:10 +00:00
depristo	75b61a3663	Updated, optimized REadBackedPileup. Updated test that was breaking the build -- it created a pileup from reads without bases... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2169 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 23:30:39 +00:00
depristo	65da04ca85	Now uses the theoretically correct relationship between SNP FP and TP ratios for Illumina data. maxQ score for a snp is now 60 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2168 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 22:08:12 +00:00
alecw	49f020b0b9	Add to svn:ignore git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2167 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 21:42:40 +00:00
alecw	ac1b289d55	Add tile to ReadHashDatum, and implement TileCovariate git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2166 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 21:41:42 +00:00
depristo	db40e28e54	ReadBackedPileup in all its glory. Documented, aligned with the output of LocusIteratorByState, and caching common outputs for performance git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2165 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 20:54:44 +00:00
rpoplin	b44363d20a	Removed silly casts from Integer to int. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2164 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 19:59:21 +00:00
alecw	84921b18ed	Push version number of picard-private-parts git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2163 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 19:28:57 +00:00
alecw	5f2801e015	Push version number of picard-private-parts git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2162 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 19:28:42 +00:00
alecw	d563f4bd2c	Add IlluminaUtil to picard-private.jar git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2161 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 19:25:21 +00:00
ebanks	d0f673f0c0	Use Math.abs so we don't get (inconsistent) -0's git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2160 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 19:08:34 +00:00
rpoplin	fe8809d12a	Added Covariate classes to the Early Access binary git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2159 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 18:59:21 +00:00
ebanks	e8bb88e33d	Updated package for early access git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2158 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 18:08:12 +00:00
rpoplin	6ff8526592	Added arguments to the recalibration walkers so the user can specify the default read group id and platform to use when a read has no read group. There are also options to force every read group and every platform to be the specified values. Added integration tests that use a bam file with no read groups. Added comments to all the covariates to explain what each of the methods in the Covariate interface are used for. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2157 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 15:41:12 +00:00
aaron	cfbd9332b0	small cleanups for the GATK paper genotyper; switched to the managed output system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2156 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 08:04:13 +00:00
ebanks	e1e5b35b19	Don't have the spanning deletions argument be a hard cutoff, but instead be a percentage of the reads in the pileup. Default is now 5% of reads. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2155 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 04:54:44 +00:00
depristo	03342c1fdd	Restructuring and interface change to ReadBackedPileup. We now lower support the Pileup interface, the BasicPileup static methods, and the ReadBackedPileup class. Now everything is a ReadBackedPileup and all methods to manipulate pileups are off of it. Also provides the recommended iterable() interface of pileup elements so you can use the syntax for (PileupElement p : pileup) and access directly from p.getBase() and p.getQual() and p.getSecondBase(). Only a few straggler walkers use the old style interface -- but those walkers will be retired soon. Documentation coming in the AM. Please everyone use the new syntax, it's safer, and will be more efficient as soon as the LocusIteratorByState directly emits the ReadBackedPileup for the Alignment context, as opposed to the current interface. In the process of the change over, discovered several bugs in the second-best base code due to things getting out of sync, but these changes were resolved manually. All other integrationtests passed without modification. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2154 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 03:51:41 +00:00
ebanks	2cb3e53b0b	Verbose mode shouldn't be printing out 'NaN's and 'Infinity's git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2153 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 22:01:00 +00:00
rpoplin	c9ff5f209c	Added a CountCovariates integration test that uses a vcf file as the list of variant sites to skip over instead of the usual dbSNP rod. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2152 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 21:51:38 +00:00
ebanks	3484f652e7	1. Variation is now passed to VariantAnnotator along with the List of Genotypes so non-genotype calls has access to all relevant info. 2. Killed OnOffGenoype 3. SpanningDeletions is now SpanningDeletionFraction git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2151 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 21:47:20 +00:00
ebanks	e05cb346f3	GenotypeLocusData now extends Variation. Also, Variations should be INSERTIONs or DELETIONs (and not just INDELs). Technically, VCF records can be indels now. More changes coming git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2150 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 21:07:55 +00:00
rpoplin	8b30279edc	style update git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2149 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 20:56:31 +00:00
rpoplin	dffa46b380	BAM files created by TableRecalibration now have the version number and list of covariates used appended to their header with a new 'PG' tag. Eventually the entire list of command line args will be put in there as well. Big thanks to Matt and Aaron. The integration test uses the --no_pg_tag so that the md5 doesn't change every time the version number changes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2148 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 20:53:57 +00:00
aaron	8fbc0c8473	fix for bug GSA-234: fasta index files couldn't handle anything but letters, numbers, or spaces in the contig name git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2147 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 19:19:47 +00:00
andrewk	3fca23cd16	Added a stub treeReduce function for debugging multi-threaded execution. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2146 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 18:51:19 +00:00
rpoplin	277e6d6b32	Further optimizations of TableRecalibration. This completes my goal of having the only math done in the map function be addition, subtraction and rounding the quality score to an integer. Everything else has been moved to the initialize method and only done once. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2145 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 18:21:57 +00:00
andrewk	e4546f802c	Accumulates coverage across hybrid selection bait intervals to assess effect of bait adjacency. Requires input bait intervals that have an overhang beyond the actual bait interval to capture coverage data at these points. Outputs R parseable file that has all data in lists and then does some basic plotting. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2144 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 18:12:34 +00:00
andrewk	e5106c9924	Hybrid selection performance statistics now include counts of the number of adjacent baits (0,1,2) using OverlapDetector and optionally include assayed bait quantities input via interval lists. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2143 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 18:07:23 +00:00
ebanks	87c1860398	I'm not sure I believe it, but JProfiler claims that calling FourBaseProbs.isVerbose() was taking 5% of my runtime... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2142 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 17:00:32 +00:00
ebanks	b3f561710f	Optimizations: 1. Only do calculations in UG for alternate allele with highest sum of quality scores (note that this also constitutes a bug fix for a precision problem we were having). 2. Avoid using Strings in DiploidGenotype when we can (it was taking 1.5% of my compute according to JProfiler) UG now runs in half the time for JOINT_ESTIMATE model. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2141 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 16:27:39 +00:00

1 2 3 4 5 ...

2178 Commits (b89b9adb2cac529daee8d221983b86e7dd5c29fb) All Branches Search

2178 Commits (b89b9adb2cac529daee8d221983b86e7dd5c29fb)

All Branches