Commit Graph

  • 2a6f3a03c9 update script to put pilot1 bams directly onto hphome ebanks 2009-09-08 14:41:35 +0000
  • 3276e01e5f fixing the build aaron 2009-09-08 13:13:55 +0000
  • f963cfcb21 Made enum listing header fields public. kiran 2009-09-08 06:12:59 +0000
  • fd20f5c2e8 For a file or files backed by a ROD implementing AllelicVariant, outputs a VCF file summarizing the information. Metadata like Hapmap and dbSNP membership, genotype LOD, read depth, etc, are annotated appropriately. The results output by this program are equivalent to those given by Gelis2PopSNPs.py. kiran 2009-09-08 06:12:18 +0000
  • 4a95f2181d print out the right variant ebanks 2009-09-08 01:37:35 +0000
  • 5791da17ae Updated to reference HLA database of unique 4 digit alleles sjia 2009-09-07 22:12:56 +0000
  • e716f9337d A few more additions; almost done... ebanks 2009-09-07 01:50:22 +0000
  • 5dbba6711c Lots of changes: (I'll send email out in a sec) 1) Moved various disparate concordance / set splitting functionalities to a new parent tool which works like VariantFiltration (i.e. people can write various modules that fit inside and can be run though it). 2) Fixed up argument parsing in VariantFiltration to use key=value format so we don't accidentally mox up values (like I had been doing). 3) Have indel rod print samples ebanks 2009-09-07 01:12:09 +0000
  • 1c3d67f0f3 Improvements to the CountCovariates and TableRecablirator, as well as regression tests for SLX and 454 data depristo 2009-09-04 22:26:57 +0000
  • 2b0d1c52b2 General WalkerTest framework. Includes some minor changes to GATK core to enable creation of true command-line like GATK modules in the code. Extensive first-pass tests for SSG depristo 2009-09-04 19:13:37 +0000
  • 471ca8201e git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1537 348d0f76-0448-11de-a6fe-93d51630548a sjia 2009-09-04 19:12:46 +0000
  • 0cc634ed5d -Renamed rodVariants to RodGeliText -Remove KGenomesSNPROD -Remove rodFLT -Renamed rodGFF to RodGenotypeChipAsGFF -Fixed a problem in SSGenotypeCall -Added basic SSGenotype Test class -Make VCFHeader constructors public aaron 2009-09-04 18:40:43 +0000
  • fd1c72c151 Fixed package name ebanks 2009-09-04 15:40:06 +0000
  • 82d99cbe43 Remove dir ebanks 2009-09-04 15:13:02 +0000
  • 6c476514f8 Moved to core. Wiki pages are going up; unit tests will be written soon. ebanks 2009-09-04 15:09:11 +0000
  • 42c71b4382 Fix for Kris: now SNPs aren't masked by default (only when they come from a mask rod) and we can design Sequenom validation assays for them. I'll move this all to core in a bit... ebanks 2009-09-04 14:52:06 +0000
  • 849dce799d This rod was all wrong for generating the alternate snp alleles (it returned null or even the wrong value); fixed. ebanks 2009-09-04 14:21:46 +0000
  • a08c68362e Renaming error to getNegLog10PError(); added Cached clearing method to GL; SSG now has a CallResult that counts calls; No more Adding class to System.out, now to logger.info; First major testing piece (and general approach too) to unit testing of a walker -- SingleSampleGenotyper now knows how many calls to make on a particular 1mb region on NA12878 for each call type and counts the number of calls *AND* the compares the geli MD5 sum to the expected one! depristo 2009-09-04 12:39:06 +0000
  • 3c2ae55859 changes for the genotype overhaul. Lots of changes focusing on the output side, from single sample genotyper to the output file formats like GLF and geli. Of note the genotype formats are still emitting posteriors as likelihoods; this is the way we've been doing it but it may change soon. aaron 2009-09-04 05:31:15 +0000
  • 2241173fff In order to help learn python, I decided to convert Michael's DoC python script to Java; the CoverageHistogram now spits out standard deviations for a good Gaussian fit. This code eventually needs to end up in the VariantFiltration system - when we are ready to parameterize on the fly. ebanks 2009-09-04 02:23:57 +0000
  • 544900aa99 Migration of some core calculations (log-likelihood probabilties, etc.) from CoverageAndPowerWalker into static methods in PoolUtils chartl 2009-09-03 21:43:29 +0000
  • 93cedf4285 --------------- | Added items | --------------- chartl 2009-09-03 21:26:04 +0000
  • ee06c7f29f git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1525 348d0f76-0448-11de-a6fe-93d51630548a sjia 2009-09-03 19:41:12 +0000
  • 043c97eede git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1524 348d0f76-0448-11de-a6fe-93d51630548a sjia 2009-09-03 19:34:42 +0000
  • c849282e44 reverting the HLA walker changes aaron 2009-09-03 19:11:57 +0000
  • 5202d959bf NM attribute changed in sam jdk (?) from Integer to Short, or maybe it is presented differently by the reader depending on whether SAM or BAM is processed; in any case, both Integer and Short are safe now asivache 2009-09-03 19:03:32 +0000
  • ada4c5a13c Small change to debug printing code sjia 2009-09-03 18:31:21 +0000
  • c3aaca1262 Improvements to make this work with uncompressed fastq files. Pulled the fastq parser out into it's own SAMFileReader-like entity. kiran 2009-09-03 17:20:16 +0000
  • 499b3536a4 Changed to use AlignmentUtils.isReadUnmapped() for better consistency with SAM spec; also, it is now explicitly enforced that unmapped reads have <NO_...> values set for ref contig and start upon "remapping" asivache 2009-09-03 16:45:07 +0000
  • 61d4dd4d01 Remove playground version ebanks 2009-09-03 15:45:26 +0000
  • 5bd99fc1c4 VariantFiltration moved to core. Another win for the team. ebanks 2009-09-03 15:41:41 +0000
  • 5130ca9b94 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1516 348d0f76-0448-11de-a6fe-93d51630548a chartl 2009-09-03 15:17:02 +0000
  • 3ac5ac066f Checking in Michael's DoC parameterization script; this functionality will eventually be moved into VariantFiltration ebanks 2009-09-03 15:07:49 +0000
  • 515fc7c476 overaggressively removed the STD outputs, back in for tests aaron 2009-09-03 15:07:45 +0000
  • 7d0a13d711 added options for building with xml output to files aaron 2009-09-03 15:00:27 +0000
  • d804a119dc script to run the complete pilot2 pipeline: from cleaning to calling to filtering [not quite finished though] ebanks 2009-09-03 14:35:55 +0000
  • bdd0a6f9fa change to make build work depristo 2009-09-03 13:43:10 +0000
  • b01ac9de0c High performance LocusIterator implementation. Now with greatly reduced memory impact and 2x (and more potentially) speed ups of raw locus iteration. General performance improvements to SSG with empirical probs. You can enable high-performance locus iteration with the -LIBS arg. It's still testing but passes validing pileup. depristo 2009-09-03 03:06:25 +0000
  • e2780c17af Checkin of the Multi-Sample SNP caller. jmaguire 2009-09-03 00:23:28 +0000
  • e2a79c5cd9 Checkpoint. The BWT that we generate now matches the first 16% of the BWT that BWT-SW generates. Cleaned up output streams to separate the byte packing / word packing from the data structure generation. hanna 2009-09-02 22:18:17 +0000
  • 3dfc77dc89 Add an indel rod which represents the initial point of the indel only (useful for alternate reference making) ebanks 2009-09-02 19:32:29 +0000
  • 58debd7e56 A convenience shortcut isReadUnmapped() added: thanks to SAM format specification, 'read unmapped' flag is not always required to be set for an unmapped read; this method checks both the flag and the alignment reference index/start (if those are set to '*' the flag is not required according to the spec!) asivache 2009-09-02 17:00:39 +0000
  • 0e6feff8f2 fixed locus pile-up limiting problem aaron 2009-09-02 16:56:44 +0000
  • d8aff9a925 Bug fixes. Was ignoring the '$' character in a few places where I shouldn't have been. hanna 2009-09-02 16:27:31 +0000
  • 55013eff78 Re-revert back to point estimation for now. We need to do this right, just not yet. Also, it's safer to let colt do the log factorial calculations for us. ebanks 2009-09-02 15:33:18 +0000
  • eb664ae287 Added VariantFiltrationWalker to GATK early release. hanna 2009-09-02 02:17:50 +0000
  • 1ada085970 Cruddy implementation of BWT creation, for understanding and testing purposes. hanna 2009-09-02 02:16:56 +0000
  • 24d809133d Oops - comment out the printouts ebanks 2009-09-02 01:45:56 +0000
  • 91ccb0f8c5 Revert to having these filters use integration over binomial probs ebanks 2009-09-02 01:40:22 +0000
  • 05c164ec69 changing the default behavior to allow any sized read pile-up (which may exceed the memory limit); the user can then select their own read limit. The default of 100K was arbitrary. aaron 2009-09-01 14:46:00 +0000
  • 54c0b6c430 Allow this ROD to consist of just the positions ebanks 2009-09-01 12:43:18 +0000
  • 4a1d79cd7b added a flag, maximum_reads_at_locus, shortName "mrl", which limits the number of reads we add to the locusByHanger. In some bam files misalignment produces pile-ups of 750K or more reads. We now limit this to the default of 100K reads. The user is warned if a locus exceeds this threshold, and no more reads are added. aaron 2009-09-01 04:21:58 +0000
  • 0addae967a IndelArtifact filter can now handle filtering false SNPs that occur within the span of an indel but after the first position ebanks 2009-09-01 03:34:39 +0000
  • 85ca68fab6 Initial version: creates a packed file from a fasta, suitable for consumption by BWT-SW. Works with E coli fasta, but will not work at this moment with multi-chr fastas. Will be made into a utility routine when BWA comes together. hanna 2009-08-31 18:39:19 +0000
  • 591f8eedbb Added setName() and getName() (however, not used anywhere yet). Now can set the name of the fasta record manually to whatever, however it will work only if done early enough. If the fasta record already started printing itself (i.e. the header line is already done), setName() will throw an exception. Could be too entangled, may reverse this back... asivache 2009-08-31 18:09:55 +0000
  • c9eb193c7f Now recognizes a special name for a bound rod track: snpmask. If a rod with this name is bound, then ONLY snps from that track will be used (to set alt reference bases to N's), but indels will be ignored. This helps when an alt. ref has to be created for a set of indel calls, and another rod (e.g. dbSNP) is used to put N's in (for sequenom). If dbSNP rod is not marked as "snpmask", the indels reported there will make their way into the alt. reference output and mess it up. asivache 2009-08-31 18:05:57 +0000
  • dab7b6e825 A useful perl module for quick argument parsing. kiran 2009-08-31 15:44:57 +0000
  • 5d155440cd A useful, rule-based parallel job dispatcher. kiran 2009-08-31 15:40:21 +0000
  • 8e3c3324fa Added filter for SNPs cleaned out by the realigner. It uses the realigner output for filtering; in addition, dbsnp indels partially work; IndelGenotyper calls don't yet work. ebanks 2009-08-31 04:32:32 +0000
  • 8bc7afe781 Smarter SW penalties ebanks 2009-08-31 04:29:19 +0000
  • 463f80c03e Require each filter or feature to declare whether or not they want mapping quality zero reads in the alignment context ebanks 2009-08-31 03:37:24 +0000
  • 1a299dd459 Require each filter or feature to declare whether or not they want mapping quality zero reads in the alignment context ebanks 2009-08-31 03:31:37 +0000
  • e70101febc Add a VEC filter for clustered SNP calls that takes advantage of the new windowed approach; delete the old standalone walker. ebanks 2009-08-31 03:14:42 +0000
  • 215e908a11 Reworking of the VariantFiltration system to allow for a windowed view of variants and inclusion of more data to the various filters. This now allows us to incorporate both the clustered SNP filter and a SNP-near-indels filter, which otherwise wasn't possible. ebanks 2009-08-31 02:16:39 +0000
  • 2402dcd4c9 Give usage message if no arguments provided. andrewk 2009-08-31 00:28:43 +0000
  • 813a4e838f Removing old code depristo 2009-08-30 19:27:11 +0000
  • 49a7babb2c Better organization of Genotype likelihood calculations. NewHotness is now just GenotypeLikelihoods. There are 1, 3, and empirical base error models available as subclasses, along with a simple way to make this (see the factory). depristo 2009-08-30 19:16:30 +0000
  • 522e4a77ae Caching support across multiple technologies depristo 2009-08-30 18:10:14 +0000
  • 5af4bb628b Intermediate checking before code reorganization. Full blown support for empirical transition probs in SSG for all platforms. Support for defaultPlatform arg in SSG. Renaming classes for final cleanup depristo 2009-08-30 17:34:43 +0000
  • 6ab9ddf9f5 Significant output formatting improvements. SNPs as indels analysis. heterozygosity rate calculations depristo 2009-08-29 21:49:09 +0000
  • bde67428fd Better formatting of the code depristo 2009-08-29 21:46:47 +0000
  • 6c604af86c Nicer building of scala programs depristo 2009-08-29 16:41:56 +0000
  • 8331c195fb changed the full name of maximum_reads to maximum_iterations for consistancy aaron 2009-08-28 16:03:46 +0000
  • 8e129d76fd Support for original quality scores OQ flag. pQ flag in TableRecalibation to preserve quality scores below a threshold (defaulting to 5) depristo 2009-08-28 14:14:21 +0000
  • f0179109fa Removing min confidence for on/off genotype depristo 2009-08-28 01:04:13 +0000
  • 4f7ed69242 toString() implemented depristo 2009-08-28 01:03:58 +0000
  • dc9d40eb9a Now requires a minimum genotype LOD before applying tests depristo 2009-08-28 00:19:23 +0000
  • 37a9b84276 corresponding test depristo 2009-08-28 00:17:42 +0000
  • bf60980653 Experitmental support for empirical P(B_true | B_miscall). --useEmpiricalTransitions flag to SSG enables this support. Much better implementation of Genotype likelihoods -- the system should scream along now. Continuing progress towards deleting old model depristo 2009-08-28 00:17:24 +0000
  • ab9458d06d support for scala walkers depristo 2009-08-28 00:15:01 +0000
  • 7cf9a54b64 change for new char/byte in BaseUtils depristo 2009-08-27 23:47:56 +0000
  • a639459112 Trival consistency change from char in to char out, not char in to byte out depristo 2009-08-27 23:37:37 +0000
  • 6012f7602b @ minor fixes to CoverageAndPowerWalker and AnalyzePowerWalker (switching to By Reference traversal, spitting out Syzygy position for sanity check) chartl 2009-08-27 21:44:18 +0000
  • bd1e679bc5 @ Fixed issues with AnalyzePowerWalker which depended on CoverageAndPowerWalker. The latter was changed but not the former. Now fixed chartl 2009-08-27 20:23:41 +0000
  • a17dad5fa9 Converts from fastq.gz to unaligned BAM format. Accepts a single fastq (for single-end run) or two fastqs (for paired-end run). Also allows you to set certain BAM metadata (read groups, etc.). kiran 2009-08-27 20:20:09 +0000
  • 8740124cda @ListUtils - Bugfix in getQScoreOrderStatistic: method would attempt to access an empty list fed into it. Now it checks for null pointers and returns 0. chartl 2009-08-27 19:31:53 +0000
  • 1da45cffb3 New: chartl 2009-08-26 21:42:35 +0000
  • 92ea947c33 Added binomProbabilityLog(int k, int n, double p) to MathUtils: chartl 2009-08-25 21:27:50 +0000
  • 478f426727 Fixed a missing method implementation in these two files. kiran 2009-08-25 21:21:58 +0000
  • f12ea3a27e Added ability for all filters to return a probability for a given variant - interpreted as the probability that the given variant should be included in the final set. The joint probability of all the filters is computed to determine whether a variant should stay or go. At the moment, this is only visible in verbose mode (specify -V). Also removed 'learning mode'; now, filters emit important stats no matter what. Various code cleanups. kiran 2009-08-25 21:17:56 +0000
  • e5115409fa Force columnSpacing to be at least one. We need a general-purpose, working tool for outputting columnar data to a PrintStream; will add JIRA. hanna 2009-08-25 19:54:54 +0000
  • 811503d67b vcf changes from Richards comments, fixed a test case aaron 2009-08-25 14:32:16 +0000
  • ee05ddde16 Added command line options to make the barcode analysis script executable by end users. andrewk 2009-08-24 21:15:09 +0000
  • ccdb4a0313 General-purpose management of output streams. hanna 2009-08-23 00:56:02 +0000
  • b316abd20f catch a malformed column header name more gracefully aaron 2009-08-21 21:05:28 +0000
  • 0364f8e989 added the ability of the VCFReader to take in compressed gzipped files natively, which is really useful for the validator aaron 2009-08-21 18:40:38 +0000
  • 647a367680 Made the size zero interval file checker emit a warnUser if we're not in unsafe mode. aaron 2009-08-21 14:40:57 +0000
  • df9133c90b the doc on File.length states it returns 0L if it doesn't exist, added a check to make sure it exists (and length < 1) aaron 2009-08-21 05:55:17 +0000
  • cd711d7697 Added detection of interval files with zero length to the GATK, and removed it from the interval merger walker: this was a critical blocking emergency issue for Eric. aaron 2009-08-21 05:35:49 +0000
  • 0bdecd8651 A most stupid bug. In cases when more than one indel variant was present in cleaned bam file, the "consensus" (max. # of occurences) call was computed incorrectly, and most of the times the call itself was not made at all. Fortunately, the locations where we see multiple indels are a minority, and many of them are suspicious anyway (manifestation of alignment problems?). Could change results of POOLED calls though. asivache 2009-08-20 22:31:44 +0000