Commit Graph

  • 25365ccb07 Adding more walkers to the base GATK package. hanna 2009-07-14 20:40:52 +0000
  • efcbb16688 un-deprecate this ROD and make it implement Genotype ebanks 2009-07-14 19:45:41 +0000
  • 84d407ff3f Fixing odd merge problem with VariantEval -- better cluster analysis (no cumsum), rodVariant is now an AllelicVariant depristo 2009-07-14 18:53:27 +0000
  • 76b09a879b Display a more intelligent error message if the user runs a locus traversal across an unmapped reads file. hanna 2009-07-14 18:36:09 +0000
  • 99ddd8ab15 bug fix for transitioning between chromosomes in GLF output aaron 2009-07-14 17:58:04 +0000
  • 7d755a4c90 GenotypeLikelihoods doesn't emit metrics, they don't make sense aaron 2009-07-14 17:22:28 +0000
  • 01fc8da270 adding the GenotypeLikelihoodsWalker, which generates GLF genotype likelihoods that are pretty much identical to the samtools calls. aaron 2009-07-14 16:57:18 +0000
  • 99f9cd84ed Warning for possibly mismatched reads / reference was very aggressive. Relax the criteria a bit. hanna 2009-07-14 16:21:22 +0000
  • 12b5d9c70c The number of loci can easily overflow an int. Change reduce type to a Long. hanna 2009-07-14 16:07:00 +0000
  • 5bf7647498 0.2.3 -- now preserves Q0 bases throughout the reads depristo 2009-07-14 12:27:31 +0000
  • 36819ed908 Initial changes to the SSG to output GLF by default aaron 2009-07-14 08:46:04 +0000
  • 0f6bfaaf73 Skip validation in case of no reads aligning. hanna 2009-07-14 02:03:36 +0000
  • a1d33f8791 -Added walker to dump strand test results to file -Refactored strand filter to handle calls from the walker ebanks 2009-07-14 01:56:50 +0000
  • bfe90af5e2 Some quick and dirty fixes to support querying unmapped BAM files. hanna 2009-07-14 01:25:20 +0000
  • e4152af387 added a big speed-up for interval list input processing. With large interval sets this was taking way too long... aaron 2009-07-13 22:00:00 +0000
  • 9f0fb9f3aa Fix for GSA-90: GATK banner and error messages should point to the wiki website. hanna 2009-07-13 21:56:41 +0000
  • b18caa2052 Fix for GSA-90: System isn't failing with an error when you use the wrong reference. hanna 2009-07-13 20:42:12 +0000
  • 52659d02d4 ignore unmapped reads in all the indel walkers (since they're giving me overhead issues) ebanks 2009-07-13 16:51:11 +0000
  • 5c321f9630 Oops! Accidentally deactivated the ArgumentFactory, needed by the CleanedReadInjector, while refactoring last night. hanna 2009-07-13 16:41:55 +0000
  • b61f9af4d7 Cleaning up, preparing to incorporate a better fix for Eric's problems with validation stringency in BAM files opened directly from the walkers. hanna 2009-07-13 01:42:13 +0000
  • 4c02607297 genotyper also needs to have 454 reads filtered out ebanks 2009-07-10 23:19:28 +0000
  • dea72c576e use the filter to ignore 454 reads in the traversal to speed up cleaning (since there's less area to actually clean against) ebanks 2009-07-10 18:34:44 +0000
  • 0070b8ea6a Until 454 goes far, far away, at least we can completely ignore it ebanks 2009-07-10 18:31:53 +0000
  • 1401606344 move warning about strictly adjacent intervals in a contig from 'remap' to 'read', so it is issued only once asivache 2009-07-10 17:58:11 +0000
  • aa4f60d980 Make sure that only reads marked as 'mapped' are filtered based on validity of alignment. hanna 2009-07-10 17:44:06 +0000
  • e01d37024a now updates mapping quality (to an arbitrary chosen value of 37 if the resulting mapping is unique) and X0, X1 tags after remapping (in REDUCE mode) asivache 2009-07-10 16:40:52 +0000
  • b08b121756 synchronyzing; debug statements commented out, so nothing changed really asivache 2009-07-10 16:38:33 +0000
  • a1eb128377 few more detailed debug printouts conditioned on if (DEBUG), so no real changes... asivache 2009-07-10 16:36:57 +0000
  • 08c4fb86e3 Derive examples from real data. hanna 2009-07-10 04:21:37 +0000
  • 03e1713988 Better support for specifying read filters to apply directly from the walkers. hanna 2009-07-09 23:59:53 +0000
  • ce08f5f0c3 Removed some unused variables, fixed some javadoc. The usual. aaron 2009-07-09 22:10:22 +0000
  • 9cfd89c54f a small refactoring, and some documentation cleanup aaron 2009-07-09 22:03:45 +0000
  • d86717db93 Refactoring of the traversal engine base class, I removed a lot of old code. aaron 2009-07-09 21:57:00 +0000
  • 3519323156 Output the correct geli text format ebanks 2009-07-09 19:45:18 +0000
  • 99631cdaa1 fix and then deprecate the rodGELI class (GELIs suck) ebanks 2009-07-09 19:18:13 +0000
  • 60a86fb34a Better handling of fasta files with non-standard extensions.x hanna 2009-07-09 18:18:48 +0000
  • 5e26770634 Hack the MicroScheduler to be tolerant of RefWalkers. We need to implement a longer-term solution to make it easier for datasources to report problems they've encountered along the way (GSA-103). hanna 2009-07-09 17:26:59 +0000
  • bc44e08225 refactored output logic kcibul 2009-07-09 16:13:01 +0000
  • 3fe7104963 Added walker to filter out clustered SNPs from a call set ebanks 2009-07-09 03:16:27 +0000
  • 8ee5c7de8e GLF reader and writer check in. aaron 2009-07-08 23:06:37 +0000
  • c8fcecbc6f Added ParseDCCSequenceData.py to repository and made changes that allow an analysis of quantity of sequence data by platform and project, moved table / record system to a new module called FlatFileTable.py and built that into ParseDCCSequenceData and CoverageEval.py; changed lod threshold in CoverageEvalWalker. andrewk 2009-07-08 22:04:26 +0000
  • 3f0304de5a Get rid of unused iterator. hanna 2009-07-08 20:39:16 +0000
  • da4d26b1ea Enum support for command-line argument system, and some cleanup for hacks to the CleanedReadInjector that were required because Enum support was missing. hanna 2009-07-08 20:26:16 +0000
  • aacec3aeb0 rod for binary GELI files (still needs to be tested) ebanks 2009-07-08 20:25:56 +0000
  • e106cf73d8 A quick change to provide more verbose output. aaron 2009-07-08 19:08:19 +0000
  • 433ad1f060 Cleanup...deprecate FastaSequenceFile2 in favor of IndexedFastaSequenceFile or ReferenceSequenceFile from Picard, depending on the application. hanna 2009-07-08 18:49:08 +0000
  • 0a67386525 . jmaguire 2009-07-08 16:59:36 +0000
  • d8fbb2b62c Refactoring; make a better home for the MalformedReadFilteringIterator. hanna 2009-07-08 16:54:20 +0000
  • c78a72e775 Applies Fisher's Exact Test to determine whether there's a strand bias and, if so, filters the call out. kiran 2009-07-08 16:14:11 +0000
  • b211f500a3 Applies secondary base feature to variants. kiran 2009-07-08 16:13:29 +0000
  • 6e31057e6b Some changes involving output of marginal calls to different, per-filter files. kiran 2009-07-08 16:12:57 +0000
  • 787c84d68b only compare pair position for paired end reads ebanks 2009-07-08 04:07:08 +0000
  • d3daecfc4d Added unit tests for function in ListUtils to randomly sample lists with replacement, updated AlleleFrequencyEstimate to provide a callType of HomRef, HetSNP, HomSNP, update indices in CoverageEval.py, and made a lot of changes to CoverageWalker biggest one being that it directly calls SingleSampleGenotyper instead of implementing some parts of SSG itself. andrewk 2009-07-08 02:05:40 +0000
  • 4ba2194b5e Filter reads whose alignment starts past the end of the contig to which it allegedly aligns. hanna 2009-07-07 22:27:44 +0000
  • 194b75613b Fix compile problem with unit tests. hanna 2009-07-07 20:29:31 +0000
  • 1d84c9da96 sortByRef now supports x:y location syntax depristo 2009-07-07 16:42:40 +0000
  • 1db15ee468 made some things protected so that I can inherit them in MultiSampleCallerAccuracyTest jmaguire 2009-07-07 15:50:28 +0000
  • 1fa71aa31d Now outputs stats. Doesn't do the downsampling thing because I think I'll have enough counts. jmaguire 2009-07-07 15:29:31 +0000
  • 5d7393d7cb Temporary fix for Eric's problems with SOLiD reads: make sure the command-line argument system takes the --validation-strictness command-line argument into account when creating SAMFileReaders. hanna 2009-07-07 15:18:05 +0000
  • f5b00c20d0 Updated python files depristo 2009-07-07 14:15:39 +0000
  • f6a273a537 other fixes for some broken unit tests aaron 2009-07-07 05:53:13 +0000
  • 033bafe7a1 fixed sam by reads test for the new filtering code aaron 2009-07-07 05:45:50 +0000
  • 2a86f2f833 an initial pass at the GLF reader, and some other genotype changes to phase out the LikelihoodObject I created. aaron 2009-07-07 04:30:27 +0000
  • 5735c87581 Basic infrastructure for filtering malformed reads. hanna 2009-07-06 22:50:22 +0000
  • b9d533042e Two-tailed HardyWeinberg test implemented. VariantEval now separate violations from summary outputs for clarity; Fixing problems with CovariateCounterTest and TabularRodTest depristo 2009-07-06 22:02:04 +0000
  • 31313481f6 Temporary patch to filter out bad alignments that aren't quite fully reported as bad. hanna 2009-07-06 18:41:55 +0000
  • 6580211c2a First version of depth of coverage filter. Right now it takes in a maximum coverage threshold given by the user. mmelgar 2009-07-06 18:22:46 +0000
  • fac7ac5142 Don't print out 0 coverage (which is always 0) ebanks 2009-07-06 17:44:32 +0000
  • d19366eaad Cleanup emergency fixes for out-of-bounds issues in reference retrieval. Fix spelling mistakes. hanna 2009-07-06 15:41:30 +0000
  • 000d92a545 added gc calculation kcibul 2009-07-06 13:07:04 +0000
  • 338cdbebad deal with screwy solid reads in the cleaner (no cigar strings) ebanks 2009-07-05 16:49:58 +0000
  • 8bcbf7f18a First draft of multi sample caller accuracy test. jmaguire 2009-07-05 16:29:13 +0000
  • 4019cd2bd7 Added ROD for parsing hapmap3 genotype files. Tweak to TabularROD to allow HapMapGenotypeROD to work. Added HapMapGenotypeROD to list of RODs in ReferenceOrderedData.java. Modified MultiSampleCaller to return a single object with most of the relvant information. jmaguire 2009-07-05 16:28:24 +0000
  • e5e249d4ac temporary fix to deal with screwy SOLiD reads ebanks 2009-07-05 03:25:57 +0000
  • cf1854b339 Fix for monsterous problems with solid data -- now can dynamically expand recalibration tables on the fly as reads declare additional read groups -- use assumeFaultyHeader flag depristo 2009-07-03 17:15:49 +0000
  • bcda66d2db Simple performance improvements depristo 2009-07-03 16:45:23 +0000
  • 0d00823332 Fix for performance bug in extending the read with X's in cases where the read is aligned off the end of the contig. hanna 2009-07-03 16:17:38 +0000
  • be2f8478c0 added supression of failure messages kcibul 2009-07-03 15:19:37 +0000
  • 25c30b12bb added MAF-style output kcibul 2009-07-03 15:10:19 +0000
  • dcb8892568 Lot of code for coverage evaluation tools including first version of python script to evaluate the downsampled SSG callls made and the java code to make all the calls at Hapmap chip sites at various downsampling levels; ListUtils contains functions for randomnly subsetting lists (with replacement) which are useful for subsetting the same elements in both the reads and the offsets lists of a LocusWalker andrewk 2009-07-03 08:07:02 +0000
  • d0cef5ff9d Oops. Specified incorrect classname in packgae for depth of coverage walker. hanna 2009-07-02 21:40:40 +0000
  • d603145cb0 Meaning of input arguments has CHANGED: minFraction is now a minimum fraction of CONSENSUS indel observation, out of all reads covering the site, required to make the call. minConsensusFraction is still the minimum fraction of CONSENSUS indel observation out of all indel observations at the site asivache 2009-07-02 20:38:10 +0000
  • 62807139fc Cleanup pileup and depth of coverage in preparation for release. Add pileup, depth of coverage, and print reads to package for distribution. hanna 2009-07-02 14:54:01 +0000
  • 6a25f0b9c5 refactored into new package kcibul 2009-07-02 14:37:54 +0000
  • 1c83b4d949 forgot to take out some test code aaron 2009-07-02 14:18:37 +0000
  • bc17ff567a When you get the reference string for a read that is mapped partially off the end of a contig, the string is masked with X's for base positions without corresponding reference positions. Now with a test case! aaron 2009-07-02 14:15:50 +0000
  • 47cb9f169e Stable tool that's the reverse of merging -- splits a file into individual BAM files, one for each sample ID in the SAM header depristo 2009-07-02 12:56:46 +0000
  • 6684cb8bc9 copySamFileHeader() utility function depristo 2009-07-02 12:55:51 +0000
  • bb92eb8b1c added a fix for overlapping reads in the locus context aaron 2009-07-02 02:08:59 +0000
  • 6570ce0b5b added the example files to the distro aaron 2009-07-01 23:16:42 +0000
  • 9d659199f3 bam and fasta files generated with the artificial tools. These will be included in the GATK distro. aaron 2009-07-01 22:49:31 +0000
  • d4d3af20f2 made a fake fasta generator, so we can now generate a complete bam / fasta combo of made up data. aaron 2009-07-01 21:35:34 +0000
  • c2e5a68aaf output format changed in --verbose --somatic mode: now also prints the <#reads with indels>/<coverage> for normal samples, rather than only for the tumor; also, code cleaned up a little asivache 2009-07-01 20:56:16 +0000
  • 4cbf069de1 First version of coverage evaluation tool andrewk 2009-07-01 20:52:25 +0000
  • 7462f3f344 Bug in setContig() fixed: sequence dictionary's .getSequences().contains() and .getSequences().indexOf() do NOT work when applied to contig names (Strings), since getSequences() returns a list of SAMSequenceRecord's; changed to querying the dictionary directly for specified contig name asivache 2009-07-01 20:50:09 +0000
  • 76fd4b3848 deal with different contigs ebanks 2009-07-01 19:17:27 +0000
  • 20fab507a8 Choose the REF if it scores equal to consensus! ebanks 2009-07-01 18:54:27 +0000
  • 9b182e3063 Prep for documenting command-line arguments: delete some arguments that don't make sense any more given the state of the traversals and GATK input requirements: all_loci (replaced by walker annotation), max OTF sorts (bam files must be sorted and indexed), threaded io (replaced by data sharding framework). hanna 2009-07-01 18:23:35 +0000
  • 5a5103cfd2 Heads up, everyone: command-line args no longer need to be public. ebanks 2009-07-01 16:09:22 +0000
  • b43d4d909e Fix CleanedReadInjectorTest to work with new CleanedReadInjector. hanna 2009-07-01 15:48:06 +0000