gatk-3.8

Commit Graph

Author	SHA1	Message	Date
kiran	a17dad5fa9	Converts from fastq.gz to unaligned BAM format. Accepts a single fastq (for single-end run) or two fastqs (for paired-end run). Also allows you to set certain BAM metadata (read groups, etc.). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1463 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-27 20:20:09 +00:00
chartl	8740124cda	@ListUtils - Bugfix in getQScoreOrderStatistic: method would attempt to access an empty list fed into it. Now it checks for null pointers and returns 0. @MathUtils - added a new method: cumBinomialProbLog which calculates a cumulant from any start point to any end point using the BinomProbabilityLog calculation. @PoolUtils - added a new utility class specifically for items related to pooled sequencing. A major part of the power calculation is now to calculate powers independently by read direction. The only method in this class (currently) takes your reads and offsets, and splits them into two groups by read direction. @CoverageAndPowerWalker - completely rewritten to split coverage, median qualities, and power by read direction. Makes use of cumBinomialProbLog rather than doing that calculation within the object itself. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1462 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-27 19:31:53 +00:00
chartl	1da45cffb3	New: Minor changes to CoverageAndPowerWalker bootstrapping (faster selection of indeces). Entirely new Aritifical Pool Walker (ArtificialPoolWalkerMk2), will likely replace ArtificialPoolWalker on the next commit. Adapted the method of sampling, and added a helper context class: ArtificialPoolContext which carries much of the burden of calculation and data handling for the walker. The walker itself maps and reduces ArtificialPoolContexts. Cheers! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1461 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-26 21:42:35 +00:00
chartl	92ea947c33	Added binomProbabilityLog(int k, int n, double p) to MathUtils: binomialProbabilityLog uses a log-space calculation of the binomial pmf to avoid the coefficient blowing up and thus returning Infinity or NaN (or in some very strange cases -Infinity). The log calculation compares very well, it seems with our current method. It's in MathUtils but could stand testing against rigorous truth data before becoming standard. Added median calculator functions to ListUtils getQScoreMedian is a new utility I wrote that given reads and offsets will find the median Q score. While I was at it, I wrote a similar method, getMedian, which will return the median of any list of Comparables, independent of initial order. These are in ListUtils. Added a new poolseq directory and three walkers CoverageAndPowerWalker is built on top of the PrintCoverage walker and prints out the power to detect a mutant allele in a pool of 2*(number of individuals in the pool) alleles. It can be flagged either to do this by boostrapping, or by pure math with a probability of error based on the median Q-score. This walker compiles, runs, and gives quite reasonable outputs that compare visually well to the power calculation computed by Syzygy. ArtificialPoolWalker is designed to take multiple single-sample .bam files and create a (random) artificial pool. The coverage of that pool is a user-defined proportion of the total coverage over all of the input files. The output is not only a new .bam file, but also an auxiliary file that has for each locus, the genotype of the individuals, the confidence of that call, and that person's representation in the artificial pool .bam at that locus. This walker compiles and, uhh, looks pretty. Needs some testing. AnalyzePowerWalker extends CoverageAndPowerWalker so that it can read previous power calcuations (e.g. from Syzygy) and print them to the output file as well for direct downstream comparisons. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1460 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-25 21:27:50 +00:00
kiran	478f426727	Fixed a missing method implementation in these two files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1459 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-25 21:21:58 +00:00
kiran	f12ea3a27e	Added ability for all filters to return a probability for a given variant - interpreted as the probability that the given variant should be included in the final set. The joint probability of all the filters is computed to determine whether a variant should stay or go. At the moment, this is only visible in verbose mode (specify -V). Also removed 'learning mode'; now, filters emit important stats no matter what. Various code cleanups. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1458 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-25 21:17:56 +00:00
hanna	e5115409fa	Force columnSpacing to be at least one. We need a general-purpose, working tool for outputting columnar data to a PrintStream; will add JIRA. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1457 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-25 19:54:54 +00:00
aaron	811503d67b	vcf changes from Richards comments, fixed a test case git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1456 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-25 14:32:16 +00:00
hanna	ccdb4a0313	General-purpose management of output streams. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1454 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-23 00:56:02 +00:00
aaron	b316abd20f	catch a malformed column header name more gracefully git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1453 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-21 21:05:28 +00:00
aaron	0364f8e989	added the ability of the VCFReader to take in compressed gzipped files natively, which is really useful for the validator git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1452 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-21 18:40:38 +00:00
aaron	647a367680	Made the size zero interval file checker emit a warnUser if we're not in unsafe mode. Also changed the default logger level from error to warn. Does anyone object? It makes sense for users to always get their warn user statements in the default logging level. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1451 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-21 14:40:57 +00:00
aaron	df9133c90b	the doc on File.length states it returns 0L if it doesn't exist, added a check to make sure it exists (and length < 1) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1450 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-21 05:55:17 +00:00
aaron	cd711d7697	Added detection of interval files with zero length to the GATK, and removed it from the interval merger walker: this was a critical blocking emergency issue for Eric. also fixed some verbage in the GAEngine. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1449 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-21 05:35:49 +00:00
asivache	0bdecd8651	A most stupid bug. In cases when more than one indel variant was present in cleaned bam file, the "consensus" (max. # of occurences) call was computed incorrectly, and most of the times the call itself was not made at all. Fortunately, the locations where we see multiple indels are a minority, and many of them are suspicious anyway (manifestation of alignment problems?). Could change results of POOLED calls though. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1448 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-20 22:31:44 +00:00
aaron	6313c465fb	we want the RMS of the reads qualities not the RMS of the RMS of the read qualities. Also the VCF version tag seems to be standardized as VCR. Updated the VCF code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1447 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-20 21:56:29 +00:00
kcibul	6c0adc9145	resuse fasta file reader git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1446 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-20 16:01:58 +00:00
aaron	0386e110cf	some documentation changes, add a couple of simple checks git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1445 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-20 05:20:27 +00:00
ebanks	10c98c418b	Walker to determine the concordance of 2 genotype call sets. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1443 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-20 01:32:44 +00:00
ebanks	1d74143ef4	A convenience argument - for Mark - so that you don't have to specify all the output file names git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1442 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-20 00:49:12 +00:00
aaron	5725de56dc	fixes in VCF, some changes to get it ready to move out of the GATK git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1441 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-19 23:31:03 +00:00
aaron	0b927f44fa	created a better seperation between instantiation of an VCF object and the object itself git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1440 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-19 20:32:50 +00:00
ebanks	ed8c92a12a	make isReference do the right thing git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1439 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-19 20:32:29 +00:00
hanna	21091b9839	Fix for invalid format error when outputting BAM files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1438 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-19 19:42:39 +00:00
aaron	4cf9110468	Adding a lot of changes to the VCF code, plus a new basic validator. Also removing an extra copy of the Artificial SAM generator that got checked in at some point. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1437 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-19 05:08:28 +00:00
ebanks	b3fe566c0c	Fix descriptions of walker args git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1436 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 19:46:48 +00:00
ebanks	82e2b7017e	Prevent array bounds errors git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1435 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 16:54:31 +00:00
ebanks	26a6f816c9	set default value for output format git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1434 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 16:17:09 +00:00
ebanks	53153fcd79	Allow RODs to specify that incomplete records are okay (i.e. that they allow optional fields) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1433 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 15:26:10 +00:00
ebanks	9b1d7921e8	added filter based on concordance to another call set git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1432 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 15:16:30 +00:00
ebanks	b2a18a9d61	- first pass at a basic indel filter (for now, based on size and homopolymer runs) - fix simple indel rod printout git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1431 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 03:04:12 +00:00
ebanks	78439f7305	Modify Sequenom input format based on official documentation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1430 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 01:42:57 +00:00
aaron	63d90702d6	another iteration of the VCFReader and VCFRecord, introducing the VCFWriter git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1429 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-17 22:17:34 +00:00
jmaguire	1e8b97b560	quietly skip empty intervals files rather than crash. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1428 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-17 20:19:14 +00:00
jmaguire	92c63fb530	It's just "lod" not discovery_lod now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1427 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-17 18:44:09 +00:00
ebanks	df5744bcd3	update this walker so any variants can be passed in git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1426 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-17 16:30:39 +00:00
aaron	8403618846	the start to the VCF implementation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1425 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-17 04:34:15 +00:00
ebanks	d4808433a1	Added option to output the locations of indels in the alternate reference git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1424 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-16 03:46:36 +00:00
ebanks	4b6ddc55bd	Merge our 2 fastq writers into 1: incorporate Kiran's secondary-base file writer into the fasta/fastq writers git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1423 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-14 20:55:23 +00:00
ebanks	0ec581080c	Refactoring the code; also, now it prints continuously instead of potentially storing one long string. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1421 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-13 01:32:46 +00:00
asivache	2a01e71277	A very simple standalone filter for fooling around with the data: can extract only mapped or only unmapped reads, only reads with mapping quals > X, reads with average base qual > Y, reads with min base qual > Z, reads with edit distance from the ref > MIN and/or < MAX git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1420 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 20:28:51 +00:00
asivache	ebec0ec171	A standalone companion to BamToFastqWalker: does the same thing but without calling in gatk's heavy artillery (does not "require" a reference either). Extracts seqs and quals and places them into fastq; along the way it also reverse complements reads that align to the negative strand (so that fastq contains reads as they come from the machine). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1419 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 20:24:37 +00:00
asivache	112a283f54	be nice, don't forget to close the reader when done git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1418 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 20:19:56 +00:00
asivache	ba2a3d8a58	Reverse qualities when read seq. is reverse complemented git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1417 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 20:17:35 +00:00
asivache	144b424933	Added : String reverse(String s) - reverses a string git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1416 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 20:16:22 +00:00
ebanks	143f8eea4e	option to output in sequenom input format git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1415 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 16:50:37 +00:00
ebanks	7f1159b6a9	Added option to mask out SNP sites with "N"s in the new reference. This is useful when producing Sequenom input files for validating indels... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1414 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 15:17:45 +00:00
ebanks	43f63b7530	Added a walker to convert a bam file to fastq format (including the option to re-reverse the negative strand reads). Picard has such a tool but it is geared towards their pipeline and requires intimate knowledge of the lanes/flowcells,etc. This is just easy. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1413 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 15:10:40 +00:00
aaron	d101c20b30	added the ability to pass in a csv file of ROD triplets (one triplet per line) to the -B option git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1412 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-11 22:10:20 +00:00
asivache	e4acd14675	Now GenomicMap maps (and RemapAlignment outputs) regions between intervals on the master reference as 'N' cigar elements, not 'D'. 'D' is now used only for bona fide deletions. Also: do not die if alignment record does not have NM tags (but mapping quality will not be recomputed after remapping/reducing for the lack of required data) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1411 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-11 21:10:17 +00:00

1 2 3 4 5 ...

1196 Commits (a17dad5fa9162f9083e52387bb9c8a75ce682eab)