gatk-3.8

Commit Graph

Author	SHA1	Message	Date
chartl	d6a0b65ac9	Changes: Rollback of Variant-related changes of r1585, additional PGC code git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1586 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-11 16:23:01 +00:00
chartl	0c54aba92a	Changes: @VariantEvalWalker - added a command line option to input a file path to a pooled call file for pooled genotype concordance checking. This string is to be passed to the PooledGenotypeConcordance object. @AllelicVariant - added a method isPooled() to distinguish pooled AllelicVariants from unpooled ones. @ all the rest - implemented isPooled(); for everything other than PooledEMSNProd it simply returns false, for PooledEMSNProd it returns true. Added: @PooledGenotypeConcordance - takes in a filepath to a pool file with the names of hapmap individuals for concordance checking with pooled calls and does said concordance checking over all pools. Commented out as all the methods are as yet unwritten. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1585 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-11 15:01:50 +00:00
ebanks	e24c8d00d5	So, the VCF spec allows for an optional meta field in the header representing the date. However, using this field means that integration tests run on the vcf file will fail the MD5 test (which is what happened to the VariantFiltration test this morning after working just fine yesterday). After consulting our resident expert (Aaron), we're going to (temporarily) remove the date from the vcf output until we can come up with a better solution. However, this shouldn't cause any short-term problems because the data truly is optional. VF test's MD5s are updated. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1580 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-10 14:28:43 +00:00
asivache	d9f3e9493f	Does not return 0-length cigar elements anymore (used to do so when previous cigar element ended exactly at the segment boundary) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1570 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 20:05:55 +00:00
ebanks	cb31d5a0ab	VariantFiltration now outputs VCF. Important changes: 1. VariantsToVCF can now be called statically to output VCF for a single ROD instance; this is temporary until we have a VCF ROD. 2. VariantFiltration now outputs only 2 files, both mandatory: all variants that pass filters in geli text, and all variants in VCF. If there are any problems, go find Aaron. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1569 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 20:04:32 +00:00
chartl	9c7f456510	Changed the short name on the PoolSize cmd line argument git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1560 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 15:53:22 +00:00
chartl	9d69bd2c84	Modifications: @CoverageAndPowerWalker - removed a hanging colon that was being printed after the reference position @VariantEvalWalker - added a command line argument for pool size for eventual use in doing pooled caller evaluations. As now, the variable is unused. @AlignmentContext - altered the scope of class variables from private to protected in order that child objects might have access to them New Additions: Filtered Contexts Sometimes we want to filter or partition reads by some aspect (quality score, read direction, current base, whatever) and use only those reads as part of the alignment context. Prior to this I've been doing the split externally and creating a new AlignmentContext object. This new approach makes it a bit easier, as each of these objects are children of AlignmentContext, and can be instantiated from a "raw" AlignmentContext. @FilteredAlignmentContext is an abstract class that defines the behavior. The abstract method 'filter' is called on the input AlignmentContext, filtering those reads and offsets by whatever you can think of. The filtered reads/offsets are then maintained in the reads and offsets fields. These classes can be passed around as AlignmentContexts themselves. Writing a new kind of read-filtered alignment context boils down to implementing the filter method. @ReverseReadsContext - a FilteredAlignmentContext that takes only reads in the reverse direction @ForwardReadsContext - a FilteredAlignmentContext that takes only reads in the forward direction @QualityScoreThresholdContext - a FilteredAlignmentContext that takes only reads above a given quality score threshold (defaults to 22 if none provided). A unit test bamfile and associated unit tests for these are in the works. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1559 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 15:49:52 +00:00
asivache	0721c450c2	Bug fix: single unmapped read now keeps mapping qual 0 after remapping, not 37! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1557 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 15:29:34 +00:00
depristo	ec0f6f23c7	LocusIterationByState is now the system deafult. Fixed Aaron's build problem git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1552 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 01:28:05 +00:00
aaron	ea6ffd3796	initial VariantEvalWalker test. More to be added soon... Also fixed the case where MD5 sums had leading zero's clipped off git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1551 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 01:02:04 +00:00
sjia	600c234643	Starting code on phasing git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1548 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-08 15:20:38 +00:00
aaron	3276e01e5f	fixing the build git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1546 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-08 13:13:55 +00:00
kiran	fd20f5c2e8	For a file or files backed by a ROD implementing AllelicVariant, outputs a VCF file summarizing the information. Metadata like Hapmap and dbSNP membership, genotype LOD, read depth, etc, are annotated appropriately. The results output by this program are equivalent to those given by Gelis2PopSNPs.py. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1544 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-08 06:12:18 +00:00
ebanks	4a95f2181d	print out the right variant git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1543 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-08 01:37:35 +00:00
sjia	5791da17ae	Updated to reference HLA database of unique 4 digit alleles git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1542 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-07 22:12:56 +00:00
ebanks	5dbba6711c	Lots of changes: (I'll send email out in a sec) 1) Moved various disparate concordance / set splitting functionalities to a new parent tool which works like VariantFiltration (i.e. people can write various modules that fit inside and can be run though it). 2) Fixed up argument parsing in VariantFiltration to use key=value format so we don't accidentally mox up values (like I had been doing). 3) Have indel rod print samples git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1540 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-07 01:12:09 +00:00
sjia	471ca8201e	git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1537 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 19:12:46 +00:00
aaron	0cc634ed5d	-Renamed rodVariants to RodGeliText -Remove KGenomesSNPROD -Remove rodFLT -Renamed rodGFF to RodGenotypeChipAsGFF -Fixed a problem in SSGenotypeCall -Added basic SSGenotype Test class -Make VCFHeader constructors public git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1536 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 18:40:43 +00:00
ebanks	6c476514f8	Moved to core. Wiki pages are going up; unit tests will be written soon. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1533 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 15:09:11 +00:00
ebanks	42c71b4382	Fix for Kris: now SNPs aren't masked by default (only when they come from a mask rod) and we can design Sequenom validation assays for them. I'll move this all to core in a bit... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1532 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 14:52:06 +00:00
depristo	a08c68362e	Renaming error to getNegLog10PError(); added Cached clearing method to GL; SSG now has a CallResult that counts calls; No more Adding class to System.out, now to logger.info; First major testing piece (and general approach too) to unit testing of a walker -- SingleSampleGenotyper now knows how many calls to make on a particular 1mb region on NA12878 for each call type and counts the number of calls AND the compares the geli MD5 sum to the expected one! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1530 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 12:39:06 +00:00
aaron	3c2ae55859	changes for the genotype overhaul. Lots of changes focusing on the output side, from single sample genotyper to the output file formats like GLF and geli. Of note the genotype formats are still emitting posteriors as likelihoods; this is the way we've been doing it but it may change soon. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1529 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 05:31:15 +00:00
ebanks	2241173fff	In order to help learn python, I decided to convert Michael's DoC python script to Java; the CoverageHistogram now spits out standard deviations for a good Gaussian fit. This code eventually needs to end up in the VariantFiltration system - when we are ready to parameterize on the fly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1528 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 02:23:57 +00:00
chartl	544900aa99	Migration of some core calculations (log-likelihood probabilties, etc.) from CoverageAndPowerWalker into static methods in PoolUtils git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1527 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 21:43:29 +00:00
chartl	93cedf4285	--------------- \| Added items \| --------------- @/varianteval/PoolAnalysis Interface to identify variant analyses that are pool-specific. @/varianteval/BasicPoolVariantAnalysis Nearly the same as BasicVariantAnalysis with the addition of a protected integer (numIndividualsInPool) which holds the pool size. One soulcrushing change is that "protected String filename" needed to become "protected String[] filename" since now multiple truth files may be looked at. It was tempting to make the change in BasicVariantAnalysis with some default methods that would maintain usability of the remainder of the VariantAnalysis objects, but I decided to hold off. We can always merge these together later. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1526 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 21:26:04 +00:00
sjia	ee06c7f29f	git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1525 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 19:41:12 +00:00
sjia	043c97eede	git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1524 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 19:34:42 +00:00
aaron	c849282e44	reverting the HLA walker changes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1523 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 19:11:57 +00:00
asivache	5202d959bf	NM attribute changed in sam jdk (?) from Integer to Short, or maybe it is presented differently by the reader depending on whether SAM or BAM is processed; in any case, both Integer and Short are safe now git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1522 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 19:03:32 +00:00
sjia	ada4c5a13c	Small change to debug printing code git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1521 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 18:31:21 +00:00
kiran	c3aaca1262	Improvements to make this work with uncompressed fastq files. Pulled the fastq parser out into it's own SAMFileReader-like entity. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1520 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 17:20:16 +00:00
asivache	499b3536a4	Changed to use AlignmentUtils.isReadUnmapped() for better consistency with SAM spec; also, it is now explicitly enforced that unmapped reads have <NO_...> values set for ref contig and start upon "remapping" git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1519 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 16:45:07 +00:00
ebanks	5bd99fc1c4	VariantFiltration moved to core. Another win for the team. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1517 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 15:41:41 +00:00
chartl	5130ca9b94	git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1516 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 15:17:02 +00:00
jmaguire	e2780c17af	Checkin of the Multi-Sample SNP caller. Doesn't work yet; same command I used to use now causes GATK to throw an exception. Will check with Matt & Aaron tomorrow, then do a regression test. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1509 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 00:23:28 +00:00
ebanks	55013eff78	Re-revert back to point estimation for now. We need to do this right, just not yet. Also, it's safer to let colt do the log factorial calculations for us. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1503 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-02 15:33:18 +00:00
ebanks	24d809133d	Oops - comment out the printouts git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1500 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-02 01:45:56 +00:00
ebanks	91ccb0f8c5	Revert to having these filters use integration over binomial probs git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1499 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-02 01:40:22 +00:00
aaron	4a1d79cd7b	added a flag, maximum_reads_at_locus, shortName "mrl", which limits the number of reads we add to the locusByHanger. In some bam files misalignment produces pile-ups of 750K or more reads. We now limit this to the default of 100K reads. The user is warned if a locus exceeds this threshold, and no more reads are added. Also CombineDup walker had an incorrect package name. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1496 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-01 04:21:58 +00:00
ebanks	0addae967a	IndelArtifact filter can now handle filtering false SNPs that occur within the span of an indel but after the first position git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1495 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-01 03:34:39 +00:00
asivache	591f8eedbb	Added setName() and getName() (however, not used anywhere yet). Now can set the name of the fasta record manually to whatever, however it will work only if done early enough. If the fasta record already started printing itself (i.e. the header line is already done), setName() will throw an exception. Could be too entangled, may reverse this back... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1493 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 18:09:55 +00:00
asivache	c9eb193c7f	Now recognizes a special name for a bound rod track: snpmask. If a rod with this name is bound, then ONLY snps from that track will be used (to set alt reference bases to N's), but indels will be ignored. This helps when an alt. ref has to be created for a set of indel calls, and another rod (e.g. dbSNP) is used to put N's in (for sequenom). If dbSNP rod is not marked as "snpmask", the indels reported there will make their way into the alt. reference output and mess it up. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1492 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 18:05:57 +00:00
ebanks	8e3c3324fa	Added filter for SNPs cleaned out by the realigner. It uses the realigner output for filtering; in addition, dbsnp indels partially work; IndelGenotyper calls don't yet work. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1489 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 04:32:32 +00:00
ebanks	463f80c03e	Require each filter or feature to declare whether or not they want mapping quality zero reads in the alignment context git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1487 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 03:37:24 +00:00
ebanks	1a299dd459	Require each filter or feature to declare whether or not they want mapping quality zero reads in the alignment context git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1486 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 03:31:37 +00:00
ebanks	e70101febc	Add a VEC filter for clustered SNP calls that takes advantage of the new windowed approach; delete the old standalone walker. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1485 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 03:14:42 +00:00
ebanks	215e908a11	Reworking of the VariantFiltration system to allow for a windowed view of variants and inclusion of more data to the various filters. This now allows us to incorporate both the clustered SNP filter and a SNP-near-indels filter, which otherwise wasn't possible. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1484 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 02:16:39 +00:00
depristo	49a7babb2c	Better organization of Genotype likelihood calculations. NewHotness is now just GenotypeLikelihoods. There are 1, 3, and empirical base error models available as subclasses, along with a simple way to make this (see the factory). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1481 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-30 19:16:30 +00:00
depristo	5af4bb628b	Intermediate checking before code reorganization. Full blown support for empirical transition probs in SSG for all platforms. Support for defaultPlatform arg in SSG. Renaming classes for final cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1479 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-30 17:34:43 +00:00
depristo	6ab9ddf9f5	Significant output formatting improvements. SNPs as indels analysis. heterozygosity rate calculations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1478 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-29 21:49:09 +00:00
depristo	f0179109fa	Removing min confidence for on/off genotype git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1473 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-28 01:04:13 +00:00
depristo	dc9d40eb9a	Now requires a minimum genotype LOD before applying tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1471 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-28 00:19:23 +00:00
depristo	a639459112	Trival consistency change from char in to char out, not char in to byte out git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1466 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-27 23:37:37 +00:00
chartl	6012f7602b	@ minor fixes to CoverageAndPowerWalker and AnalyzePowerWalker (switching to By Reference traversal, spitting out Syzygy position for sanity check) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1465 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-27 21:44:18 +00:00
chartl	bd1e679bc5	@ Fixed issues with AnalyzePowerWalker which depended on CoverageAndPowerWalker. The latter was changed but not the former. Now fixed git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1464 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-27 20:23:41 +00:00
kiran	a17dad5fa9	Converts from fastq.gz to unaligned BAM format. Accepts a single fastq (for single-end run) or two fastqs (for paired-end run). Also allows you to set certain BAM metadata (read groups, etc.). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1463 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-27 20:20:09 +00:00
chartl	8740124cda	@ListUtils - Bugfix in getQScoreOrderStatistic: method would attempt to access an empty list fed into it. Now it checks for null pointers and returns 0. @MathUtils - added a new method: cumBinomialProbLog which calculates a cumulant from any start point to any end point using the BinomProbabilityLog calculation. @PoolUtils - added a new utility class specifically for items related to pooled sequencing. A major part of the power calculation is now to calculate powers independently by read direction. The only method in this class (currently) takes your reads and offsets, and splits them into two groups by read direction. @CoverageAndPowerWalker - completely rewritten to split coverage, median qualities, and power by read direction. Makes use of cumBinomialProbLog rather than doing that calculation within the object itself. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1462 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-27 19:31:53 +00:00
chartl	1da45cffb3	New: Minor changes to CoverageAndPowerWalker bootstrapping (faster selection of indeces). Entirely new Aritifical Pool Walker (ArtificialPoolWalkerMk2), will likely replace ArtificialPoolWalker on the next commit. Adapted the method of sampling, and added a helper context class: ArtificialPoolContext which carries much of the burden of calculation and data handling for the walker. The walker itself maps and reduces ArtificialPoolContexts. Cheers! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1461 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-26 21:42:35 +00:00
chartl	92ea947c33	Added binomProbabilityLog(int k, int n, double p) to MathUtils: binomialProbabilityLog uses a log-space calculation of the binomial pmf to avoid the coefficient blowing up and thus returning Infinity or NaN (or in some very strange cases -Infinity). The log calculation compares very well, it seems with our current method. It's in MathUtils but could stand testing against rigorous truth data before becoming standard. Added median calculator functions to ListUtils getQScoreMedian is a new utility I wrote that given reads and offsets will find the median Q score. While I was at it, I wrote a similar method, getMedian, which will return the median of any list of Comparables, independent of initial order. These are in ListUtils. Added a new poolseq directory and three walkers CoverageAndPowerWalker is built on top of the PrintCoverage walker and prints out the power to detect a mutant allele in a pool of 2*(number of individuals in the pool) alleles. It can be flagged either to do this by boostrapping, or by pure math with a probability of error based on the median Q-score. This walker compiles, runs, and gives quite reasonable outputs that compare visually well to the power calculation computed by Syzygy. ArtificialPoolWalker is designed to take multiple single-sample .bam files and create a (random) artificial pool. The coverage of that pool is a user-defined proportion of the total coverage over all of the input files. The output is not only a new .bam file, but also an auxiliary file that has for each locus, the genotype of the individuals, the confidence of that call, and that person's representation in the artificial pool .bam at that locus. This walker compiles and, uhh, looks pretty. Needs some testing. AnalyzePowerWalker extends CoverageAndPowerWalker so that it can read previous power calcuations (e.g. from Syzygy) and print them to the output file as well for direct downstream comparisons. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1460 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-25 21:27:50 +00:00
kiran	478f426727	Fixed a missing method implementation in these two files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1459 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-25 21:21:58 +00:00
kiran	f12ea3a27e	Added ability for all filters to return a probability for a given variant - interpreted as the probability that the given variant should be included in the final set. The joint probability of all the filters is computed to determine whether a variant should stay or go. At the moment, this is only visible in verbose mode (specify -V). Also removed 'learning mode'; now, filters emit important stats no matter what. Various code cleanups. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1458 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-25 21:17:56 +00:00
asivache	0bdecd8651	A most stupid bug. In cases when more than one indel variant was present in cleaned bam file, the "consensus" (max. # of occurences) call was computed incorrectly, and most of the times the call itself was not made at all. Fortunately, the locations where we see multiple indels are a minority, and many of them are suspicious anyway (manifestation of alignment problems?). Could change results of POOLED calls though. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1448 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-20 22:31:44 +00:00
kcibul	6c0adc9145	resuse fasta file reader git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1446 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-20 16:01:58 +00:00
ebanks	10c98c418b	Walker to determine the concordance of 2 genotype call sets. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1443 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-20 01:32:44 +00:00
ebanks	1d74143ef4	A convenience argument - for Mark - so that you don't have to specify all the output file names git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1442 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-20 00:49:12 +00:00
ebanks	82e2b7017e	Prevent array bounds errors git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1435 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 16:54:31 +00:00
ebanks	26a6f816c9	set default value for output format git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1434 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 16:17:09 +00:00
ebanks	9b1d7921e8	added filter based on concordance to another call set git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1432 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 15:16:30 +00:00
ebanks	b2a18a9d61	- first pass at a basic indel filter (for now, based on size and homopolymer runs) - fix simple indel rod printout git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1431 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 03:04:12 +00:00
ebanks	78439f7305	Modify Sequenom input format based on official documentation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1430 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-18 01:42:57 +00:00
ebanks	d4808433a1	Added option to output the locations of indels in the alternate reference git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1424 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-16 03:46:36 +00:00
ebanks	4b6ddc55bd	Merge our 2 fastq writers into 1: incorporate Kiran's secondary-base file writer into the fasta/fastq writers git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1423 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-14 20:55:23 +00:00
ebanks	0ec581080c	Refactoring the code; also, now it prints continuously instead of potentially storing one long string. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1421 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-13 01:32:46 +00:00
asivache	2a01e71277	A very simple standalone filter for fooling around with the data: can extract only mapped or only unmapped reads, only reads with mapping quals > X, reads with average base qual > Y, reads with min base qual > Z, reads with edit distance from the ref > MIN and/or < MAX git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1420 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 20:28:51 +00:00
asivache	ebec0ec171	A standalone companion to BamToFastqWalker: does the same thing but without calling in gatk's heavy artillery (does not "require" a reference either). Extracts seqs and quals and places them into fastq; along the way it also reverse complements reads that align to the negative strand (so that fastq contains reads as they come from the machine). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1419 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 20:24:37 +00:00
asivache	112a283f54	be nice, don't forget to close the reader when done git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1418 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 20:19:56 +00:00
asivache	ba2a3d8a58	Reverse qualities when read seq. is reverse complemented git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1417 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 20:17:35 +00:00
ebanks	143f8eea4e	option to output in sequenom input format git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1415 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 16:50:37 +00:00
ebanks	7f1159b6a9	Added option to mask out SNP sites with "N"s in the new reference. This is useful when producing Sequenom input files for validating indels... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1414 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 15:17:45 +00:00
ebanks	43f63b7530	Added a walker to convert a bam file to fastq format (including the option to re-reverse the negative strand reads). Picard has such a tool but it is geared towards their pipeline and requires intimate knowledge of the lanes/flowcells,etc. This is just easy. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1413 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-12 15:10:40 +00:00
asivache	e4acd14675	Now GenomicMap maps (and RemapAlignment outputs) regions between intervals on the master reference as 'N' cigar elements, not 'D'. 'D' is now used only for bona fide deletions. Also: do not die if alignment record does not have NM tags (but mapping quality will not be recomputed after remapping/reducing for the lack of required data) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1411 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-11 21:10:17 +00:00
ebanks	5fab934f4e	- moved the reference maker to its own directory - added first version of a more complicated reference maker which takes in RODs and creates an alternative reference based on the variants (indels and/or SNPs) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1409 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-11 18:01:06 +00:00
sjia	1851613de4	Now using larger database of HLA alleles git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1405 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-11 03:11:14 +00:00
asivache	3208eaabcc	A standalone picard-level tool for breaking individual reads into "pairs" of first/last N bases. Supports: * splitting off only start or end of the read, or both; the output will contain chopped sequences AND corresponding base qualities * splitting arbitrary number of bases off each end (different numbers for left and right segments can be specified; segments can overlap) * splitting only unmapped reads, ignoring mapped ones * writing splitted ends into separate sam/bam files, or into a single output file * decorating original read names with user-specified suffixes for each end (e.g. _1 and _2 for left and right parts of the read); default: no decoration, original read names are used * when mapped reads are split, the alignment cigars are chopped appropriately and the alignment start positions are adjusted (for the right end) to correctly specify the alignment of the selected part of the read git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1402 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-10 20:42:49 +00:00
asivache	36312ae4b2	tiny cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1401 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-10 20:26:52 +00:00
asivache	921d4f4e95	RemapAlignments is a standalone picard-level tool that does not use gatk engine; moved to 'tools' git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1396 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-10 15:41:07 +00:00
depristo	089dab00e2	Was discordance rate, now concordance rate git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1393 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-07 19:37:52 +00:00
depristo	6d3ef73868	Now includes statistics on the allele agreement with dbSNP -- counts concordant calls as dbSNP = A/C and we say A/C, vs. we say A/T git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1392 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-07 19:37:07 +00:00
depristo	a864c2f025	Updated polarized reference priors, need DiploidGenotypePriors class that is directly used by the NewHotness genotypelikelihoods, more bug fixes and refactoring, etc. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1390 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-07 19:00:06 +00:00
ebanks	db250f8d3e	Don't print if not in learning mode git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1389 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-07 06:08:02 +00:00
ebanks	4c1fa52ddf	-Added mapping quality zero filter -Set some reasonable defaults (based on pilot2) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1388 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-07 03:18:02 +00:00
sjia	d60d5aa516	Fixed bug: previously reset likelihoods after each region/exon. Better comments/documentation added git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1386 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-05 18:44:46 +00:00
kcibul	0d47798721	made booster distance a parameter git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1385 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-05 18:29:21 +00:00
ebanks	3b74b3ba74	print out ref/alt ratio, not major/minor git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1384 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-05 16:36:25 +00:00
depristo	65e9dcf5b7	Fully operational version of the new genotype likelihoods class. (1) Much cleaner interface. Now explicitly stores likelihoods, priors, and posteriors in separate arrays indexed by an enum, (2) no longer can be used to make calls, it relies on SSGGenotypeCall to order the likelihoods, calculate best to ref, etc, this is just for calculating genotype likelihoods now; (3) Now performs extensive error checking with validate() to ensure the system is behaving properly. (4) fixed incorrect treatment of N bases, which we being counted against everyone (5) likely found a stats bug in which heterozyosity was being applied incorrectly to the genotype priors git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1382 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-05 01:00:55 +00:00
sjia	68309408e4	git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1378 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-04 21:23:01 +00:00
sjia	45ab212f22	Post-presentation update git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1377 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-04 21:21:12 +00:00
hanna	21d1eba502	Cleaned division of responsibilities between arguments to map function. Reference has been changed from an array of bases to an object (ReferenceContext), and LocusContext has been renamed to reflect the fact that it contains contextual information only about the alignments, not the locus in general. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1376 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-04 21:01:37 +00:00
kcibul	a5a7d7dab8	added "booster" metrics git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1375 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-04 20:53:45 +00:00
ebanks	3a8d923785	minor output changes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1374 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-04 20:12:16 +00:00

1 2 3 4 5 ...

669 Commits (235de38c2e2f62b9cc2c757ef22648083c17f5c5)