gatk-3.8

Commit Graph

Author	SHA1	Message	Date
ebanks	9b1d7921e8	added filter based on concordance to another call set git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1432 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 15:16:30 +00:00
ebanks	b2a18a9d61	- first pass at a basic indel filter (for now, based on size and homopolymer runs) - fix simple indel rod printout git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1431 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 03:04:12 +00:00
ebanks	78439f7305	Modify Sequenom input format based on official documentation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1430 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 01:42:57 +00:00
ebanks	d4808433a1	Added option to output the locations of indels in the alternate reference git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1424 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-16 03:46:36 +00:00
ebanks	4b6ddc55bd	Merge our 2 fastq writers into 1: incorporate Kiran's secondary-base file writer into the fasta/fastq writers git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1423 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-14 20:55:23 +00:00
ebanks	0ec581080c	Refactoring the code; also, now it prints continuously instead of potentially storing one long string. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1421 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-13 01:32:46 +00:00
asivache	2a01e71277	A very simple standalone filter for fooling around with the data: can extract only mapped or only unmapped reads, only reads with mapping quals > X, reads with average base qual > Y, reads with min base qual > Z, reads with edit distance from the ref > MIN and/or < MAX git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1420 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 20:28:51 +00:00
asivache	ebec0ec171	A standalone companion to BamToFastqWalker: does the same thing but without calling in gatk's heavy artillery (does not "require" a reference either). Extracts seqs and quals and places them into fastq; along the way it also reverse complements reads that align to the negative strand (so that fastq contains reads as they come from the machine). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1419 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 20:24:37 +00:00
asivache	112a283f54	be nice, don't forget to close the reader when done git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1418 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 20:19:56 +00:00
asivache	ba2a3d8a58	Reverse qualities when read seq. is reverse complemented git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1417 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 20:17:35 +00:00
ebanks	143f8eea4e	option to output in sequenom input format git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1415 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 16:50:37 +00:00
ebanks	7f1159b6a9	Added option to mask out SNP sites with "N"s in the new reference. This is useful when producing Sequenom input files for validating indels... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1414 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 15:17:45 +00:00
ebanks	43f63b7530	Added a walker to convert a bam file to fastq format (including the option to re-reverse the negative strand reads). Picard has such a tool but it is geared towards their pipeline and requires intimate knowledge of the lanes/flowcells,etc. This is just easy. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1413 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 15:10:40 +00:00
asivache	e4acd14675	Now GenomicMap maps (and RemapAlignment outputs) regions between intervals on the master reference as 'N' cigar elements, not 'D'. 'D' is now used only for bona fide deletions. Also: do not die if alignment record does not have NM tags (but mapping quality will not be recomputed after remapping/reducing for the lack of required data) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1411 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-11 21:10:17 +00:00
ebanks	5fab934f4e	- moved the reference maker to its own directory - added first version of a more complicated reference maker which takes in RODs and creates an alternative reference based on the variants (indels and/or SNPs) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1409 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-11 18:01:06 +00:00
sjia	1851613de4	Now using larger database of HLA alleles git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1405 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-11 03:11:14 +00:00
asivache	3208eaabcc	A standalone picard-level tool for breaking individual reads into "pairs" of first/last N bases. Supports: * splitting off only start or end of the read, or both; the output will contain chopped sequences AND corresponding base qualities * splitting arbitrary number of bases off each end (different numbers for left and right segments can be specified; segments can overlap) * splitting only unmapped reads, ignoring mapped ones * writing splitted ends into separate sam/bam files, or into a single output file * decorating original read names with user-specified suffixes for each end (e.g. _1 and _2 for left and right parts of the read); default: no decoration, original read names are used * when mapped reads are split, the alignment cigars are chopped appropriately and the alignment start positions are adjusted (for the right end) to correctly specify the alignment of the selected part of the read git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1402 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-10 20:42:49 +00:00
asivache	36312ae4b2	tiny cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1401 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-10 20:26:52 +00:00
asivache	921d4f4e95	RemapAlignments is a standalone picard-level tool that does not use gatk engine; moved to 'tools' git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1396 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-10 15:41:07 +00:00
depristo	089dab00e2	Was discordance rate, now concordance rate git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1393 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-07 19:37:52 +00:00
depristo	6d3ef73868	Now includes statistics on the allele agreement with dbSNP -- counts concordant calls as dbSNP = A/C and we say A/C, vs. we say A/T git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1392 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-07 19:37:07 +00:00
depristo	a864c2f025	Updated polarized reference priors, need DiploidGenotypePriors class that is directly used by the NewHotness genotypelikelihoods, more bug fixes and refactoring, etc. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1390 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-07 19:00:06 +00:00
ebanks	db250f8d3e	Don't print if not in learning mode git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1389 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-07 06:08:02 +00:00
ebanks	4c1fa52ddf	-Added mapping quality zero filter -Set some reasonable defaults (based on pilot2) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1388 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-07 03:18:02 +00:00
sjia	d60d5aa516	Fixed bug: previously reset likelihoods after each region/exon. Better comments/documentation added git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1386 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-05 18:44:46 +00:00
kcibul	0d47798721	made booster distance a parameter git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1385 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-05 18:29:21 +00:00
ebanks	3b74b3ba74	print out ref/alt ratio, not major/minor git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1384 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-05 16:36:25 +00:00
depristo	65e9dcf5b7	Fully operational version of the new genotype likelihoods class. (1) Much cleaner interface. Now explicitly stores likelihoods, priors, and posteriors in separate arrays indexed by an enum, (2) no longer can be used to make calls, it relies on SSGGenotypeCall to order the likelihoods, calculate best to ref, etc, this is just for calculating genotype likelihoods now; (3) Now performs extensive error checking with validate() to ensure the system is behaving properly. (4) fixed incorrect treatment of N bases, which we being counted against everyone (5) likely found a stats bug in which heterozyosity was being applied incorrectly to the genotype priors git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1382 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-05 01:00:55 +00:00
sjia	68309408e4	git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1378 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-04 21:23:01 +00:00
sjia	45ab212f22	Post-presentation update git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1377 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-04 21:21:12 +00:00
hanna	21d1eba502	Cleaned division of responsibilities between arguments to map function. Reference has been changed from an array of bases to an object (ReferenceContext), and LocusContext has been renamed to reflect the fact that it contains contextual information only about the alignments, not the locus in general. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1376 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-04 21:01:37 +00:00
kcibul	a5a7d7dab8	added "booster" metrics git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1375 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-04 20:53:45 +00:00
ebanks	3a8d923785	minor output changes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1374 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-04 20:12:16 +00:00
mmelgar	939b19e715	Committing the first version of the homopolymer filter. Removes SNPs that occur at the edges of homopolymer runs and whose nonref allele matches the repeated base in the homopolymer. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1373 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-04 14:35:51 +00:00
depristo	20ff603339	New hotness and old and Busted genotype likelihood objects are now in the code base as I work towards a bug-free SSG along with a cleaner interface to the genotype likelihood object git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1372 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-03 23:07:53 +00:00
depristo	3485397483	Reorganization of the genotyping system git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1370 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-03 20:55:31 +00:00
ebanks	9f1d3aed26	-Output single filtration stats file with input from all filters -move out isHet test to GenotypeUtils so all can use it git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1369 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-03 20:44:21 +00:00
depristo	d840a47b11	Slight reorganization of genotype interface git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1366 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-03 19:17:15 +00:00
depristo	20986a03de	cleanup before moving files git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1365 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-03 19:08:24 +00:00
ebanks	e3b08f245f	Pull out RMS calculation into MathUtils for all to use git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1364 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-03 17:00:20 +00:00
ebanks	e495b836d3	- added mapping quality filter - make the filters brainless in that they strictly have thresholds and filter based on them; require user to calculate and input these thresholds. - update filters in preparation for migration to new output format git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1363 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-03 16:46:51 +00:00
kiran	8bc925a216	Commit on the behalf of Mark: cleaning up some old and busted code in GenotypeLikelihood and associated objects. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1361 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-31 21:18:30 +00:00
aaron	9dfee7a75c	the "-genotype" option now acts correctly as a discovery mode caller in SSG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1359 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-31 18:31:45 +00:00
sjia	9dada95ec3	git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1357 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-31 16:21:16 +00:00
andrewk	678c2533ca	Removed custom output stream for file and replaced with the standard out PrintStream git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1350 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-30 22:36:42 +00:00
andrewk	44673b2dce	Removed a debugging println that was accidentally checked in git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1348 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-30 22:23:27 +00:00
andrewk	845488ff94	VariantEval now decides whether a variant is not confidently called using BestVsNetxBest if genotypes are being evaluated and BestVsRef if not (variant discovery only). Also, the absolute value of the BestVsRef LOD (getVariantionConfidence) is used so that confident reference calls (if the GELI has output them) will show up in the final table as reference calls rather than no calls. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1347 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-30 21:54:06 +00:00
andrewk	fdc7cc555b	Removed extra column name from geliHeaderString that was mislabeling the 10 genotype likelihoods by shifting them over by onex git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1345 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-30 21:42:02 +00:00
aaron	0087234ed7	small code cleanup, a couple of little changes to SSGGenotypeCall git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1343 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-30 19:47:37 +00:00
ebanks	fbc7d44bc7	don't allow users to input priors anymore; they should be using heterozygosity and having the SSG calculate priors. Note that nothing was changed for dnSNP/hapmap priors (not sure what we want to do with these yet - any thoughts?) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1342 348d0f76-0448-11de-a6fe-93d51630548a	2009-07-30 19:10:33 +00:00

1 2 3 4 5 ...

552 Commits (53153fcd79bd08a78868e21998bdda009b83a8d3)