Commit Graph

  • c7b032cc88 missed a file in the add. aaron 2009-05-27 18:27:38 +0000
  • 3c3cd5bb64 Moving some of the data sharding around. A new shard catagory now exits, INTERVAL. This saved a lot of code that was mirroring the same approach in both the read and locus shard strategies. aaron 2009-05-27 18:24:31 +0000
  • 99524ab6d0 package name corrected asivache 2009-05-27 18:20:43 +0000
  • b76f8c4eb5 moved from playground to gatk asivache 2009-05-27 18:18:33 +0000
  • c3678c7bb9 moved from playground to gatk asivache 2009-05-27 18:18:08 +0000
  • 5b310e48f5 changed to use factored out Transcript class; some docs added (not much) asivache 2009-05-27 18:17:23 +0000
  • ae0bac5696 'made public' implies the 'public' keyword, actually... asivache 2009-05-27 17:57:01 +0000
  • 41c1a62ac4 formerly private class, factored out and made public. Represents a transcript annotation (transcript id, genomic location, genomic intervals for all exons present in this transcript, etc) asivache 2009-05-27 17:52:38 +0000
  • 8edba13ded Unit tests for the reference views. Partially addresses GSA-25. hanna 2009-05-27 17:49:45 +0000
  • 9bd6489f8e Output indels in the format appropriate for low-coverage indel submission ebanks 2009-05-27 17:32:15 +0000
  • 3098ed091c checking in new folder for perl scripts AND a simple script that takes an input text file and reference dictionary (.fai) and performs stable sort of the input lines according to the contig order specified by the dictionary. Position of the contig filed to sort on in the input lines is specified as --k POS option. Input lines may specify contigs that are not in the dictionary, in this case the additional contigs will be added at the end of the sorted output, after all known contigs. The sorting order between these additional contigs is simply the order in which they first appear in the input asivache 2009-05-27 16:34:55 +0000
  • 919e995b7f -Moved my walkers to indels directory -Removed entropy walker and replaced it with mismatch (column) walker -Some improvements to the cleaner (more to come) ebanks 2009-05-27 16:34:24 +0000
  • df8490a0cf Remove unused dependency on commons logging. hanna 2009-05-27 14:12:26 +0000
  • 864a1e81e3 Delete stale class from previous rethink of the traversal engine. hanna 2009-05-27 13:52:03 +0000
  • 6fab1a64fa Started work on GLF input / output basics. Do not use. aaron 2009-05-26 22:49:59 +0000
  • b81135c606 bug fixed; this rod seems to work now... asivache 2009-05-26 22:25:34 +0000
  • c72601322a now returns the farm id when submitting a job! depristo 2009-05-26 22:23:24 +0000
  • a488d2dbb2 Lazy creation of output streams. Only create output streams when absolutely necessary. hanna 2009-05-26 21:56:57 +0000
  • ab7bb5800a forgot to remove debug print statement asivache 2009-05-26 21:38:27 +0000
  • 568a0d3c27 exon coordinates are now parsed correctly (?). IF DELIMITER IS THE LAST CHARACTER IN A STRING, String.split() DOES NOT return empty field as the last one; instead, the last field returned will be the one immediately before such delimiter! Wicked. asivache 2009-05-26 21:36:50 +0000
  • f4119c17de still working on it... asivache 2009-05-26 21:07:38 +0000
  • d73f2e95cc refseq added to the list of known rod types asivache 2009-05-26 21:06:44 +0000
  • 23b7a28015 simple walker that works off pre-computed tumor/normal genotyping calls (e.g. samtools pileup). Collects overal stats and also writes somatic variants into IGV-compatible bed file if asked to. NOT finished. NOT tested asivache 2009-05-26 21:05:47 +0000
  • 8f1cabd33d cmd line args changed - again; internally uses VariantType enum asivache 2009-05-26 21:03:58 +0000
  • 9ef1a21112 minor changes asivache 2009-05-26 21:03:06 +0000
  • d994544c47 Added back end code support for Sharding based on genomic location for reads. Changed the sharding code to take GenomeLocSortedSet instead of a list<GenomeLoc>, and added a bunch of much simplier and cleaner test cases. aaron 2009-05-26 20:57:46 +0000
  • 4edcdffe45 refseq annotation track: should be able to provide (multiple) transcript annotations available over a given genomic position. NOT finished and NOT tested! asivache 2009-05-26 20:07:15 +0000
  • 149cc9989b spaces!!!!!!!!! andrewk 2009-05-26 19:40:25 +0000
  • c2df35b7fe - get leftmost position of indel correct - don't try to clean reads with mapping quality of 0 - un-deprecate ebanks 2009-05-26 17:24:58 +0000
  • 54bb643d19 Validated Mark's assertion that GSA-27 is fixed. Also did some cleanup on the pileup walker so that it doesn't output to System.out. hanna 2009-05-26 15:58:21 +0000
  • 008d677bea Fixed ValidatingPileup to work with Andrey's new rodSAMPileup -> GenotypeList type hierarchy. Fixed reference-ordered data validation system to validate class hierarchies instead of specific class types. hanna 2009-05-23 20:50:28 +0000
  • d056f9f3e8 Changed the name to reflect the sorted nature of the set, added some fixes aaron 2009-05-22 22:34:24 +0000
  • 831d430025 Added a collection for storing GenomeLocs, that also has functions for removing by genomic region (that may span multiple GenomeLoc's in the collection), and adding regions, which are then merged with any overlapping regions. aaron 2009-05-22 21:52:40 +0000
  • 34413362fd Bugfix: handle case where queue is empty. hanna 2009-05-22 21:45:22 +0000
  • ec2e8d5726 Fixes for getting ValidatingPileup running in parallel. hanna 2009-05-22 21:20:24 +0000
  • cd80e3f372 Replaced dumb training function with a version that creates a training set slightly more sensibly. kiran 2009-05-22 19:34:33 +0000
  • 02c0afdb85 Added the ability to specify the sorted, unaligned bam and/or the sorted, aligned bam such that broken computations can be restarted. kiran 2009-05-22 19:33:34 +0000
  • 454a6d1df7 Fixed an egregious error in simpleReverseComplement wherein the RC'd string would be composed entirely of the last base. kiran 2009-05-22 19:32:20 +0000
  • 2a5be1debe Cleanup in datasources.providers namespace. Make it easier for others writing traversal engines to use. hanna 2009-05-22 19:12:00 +0000
  • 02fc4f145f refactoring: a couple of general purpose (hopefully useful?) methods/classes extracted into a standalone utils class asivache 2009-05-22 18:54:40 +0000
  • 4b718688d5 no changes, really, just synchronizing (instead of reversing) to increase the amount of entropy asivache 2009-05-22 17:27:28 +0000
  • 893f1b6427 updated asivache 2009-05-22 17:25:50 +0000
  • a9dfbfb309 internal changes and some refactoring. slightly different final report. Now can take tracks that implement either Genotype or GenotypeList; takes an arg specifying what variants to look for (POINT - aka snp - or INDEL); takes an arg specifying whether default ref/ref call of one type (INDEL/POINT) should be implicitly assumed if another call (POINT/INDEL respectively) was made at the same position [this is probably most useful for indels and only (?) for sam pileups: if we have only point mutation call at a given position, it does mean that we do have coverage, and that there was no evidence whatsoever for an indel, so we have an implicit 'no-indel' call] asivache 2009-05-22 17:25:09 +0000
  • d5bb4d9ba9 Auxiliary class that can read one line from samtools pileup file. Used by rodSAMPileup to read pairs of lines as needed. NOTE: this class implements Genotype and (a trivial) GenotypeList, but it is NOT a rod! asivache 2009-05-22 17:20:01 +0000
  • 732fed9aad ALERT, ALERT! rodSAMPileup is now a GenotypeList, not a Genotype! Now it can intelligently read full samtools pileup files (containing, in general, both point and indel genotypes at the same position). No need to split/synchronize pileups from different individuals anymore, hooray! asivache 2009-05-22 17:17:59 +0000
  • 26633957d9 Genotype interface is extended: now it requires implementing object to be able to tell whether it isPointGenotype() or isIndelGenotype() (and the contract requires, e.g. alleles to be represented differently) asivache 2009-05-22 17:14:46 +0000
  • d9fc84f1e3 actually checking in the first pass depristo 2009-05-22 17:13:27 +0000
  • 8773b3a430 a trivial wrapper interface for the objects capable of holding 'full' genotype, i.e. both point (as in ref/snp) and indel variants at the same reference position asivache 2009-05-22 17:12:01 +0000
  • 7a979859a9 Intermediate checking for evaluation -- now supports transition / transversion evaluation depristo 2009-05-22 17:05:06 +0000
  • f2ea193149 For some reason the apostraphes in the comments were throwing annoying compile-time warnings: "unmappable character for encoding UTF8" ebanks 2009-05-22 14:07:07 +0000
  • 9902ce8073 properly flush the gzip output stream. this was a subtle inheritance bug. jmaguire 2009-05-22 13:57:58 +0000
  • 63caca31bf minor update in report printout format asivache 2009-05-22 13:56:09 +0000
  • 7afc10fd6f updated, reports more stuff now, including stats for external consistency checks asivache 2009-05-21 22:28:18 +0000
  • 30c63daf89 More improvements to the duplicate quality combiner, making progress towards a clean system depristo 2009-05-21 22:26:57 +0000
  • 04e51c8d1d Better version of MergeBAMBatch -- more options for creating the file depristo 2009-05-21 22:26:19 +0000
  • 65995887fc Releasable version of the Pileup walker depristo 2009-05-21 22:25:37 +0000
  • dc17a5661d Better accessors for dealing with second base prob pileups depristo 2009-05-21 22:25:16 +0000
  • d261459c48 Useful function to create a string with N copies of a same char depristo 2009-05-21 22:23:52 +0000
  • 287bb52e81 Refreshes the mount points that we'll be using (so that the program will play nicely with LSF). kiran 2009-05-21 20:36:12 +0000
  • b5ad5176f7 stick headers on the output tables jmaguire 2009-05-21 20:35:50 +0000
  • 83e1454a11 Added a method to determine the fraction of a sequence that's taken up by the most frequent base. kiran 2009-05-21 20:35:31 +0000
  • bdf772f017 Added test for determining the fraction of a sequence that's taken up by the most frequent base (quick-and-dirty homopolymer testing). kiran 2009-05-21 20:35:08 +0000
  • d61a5261c1 Better integration of reference-ordered data into the data sharding system. hanna 2009-05-21 20:09:32 +0000
  • 0d58e4ccc9 -check original alignments for indels when computing mismatch score -move logging to debug ebanks 2009-05-21 19:55:42 +0000
  • 5f67914b08 Added loads of documentation. kiran 2009-05-21 19:40:47 +0000
  • 1a9d5cea29 Added a method to reverse-complement a String object, preserving 'N' and '.' bases. kiran 2009-05-21 19:39:39 +0000
  • 1a3ca97d29 remove the ivy command for dependency on BCEL, we're not using it right now. aaron 2009-05-21 19:35:53 +0000
  • a687c6bc03 Added a method to refresh an NFS mount point (necessary to prevent NFS flakiness when running on the LSF farm. kiran 2009-05-21 19:31:54 +0000
  • 324ef9cbd1 Test class for PathUtils. kiran 2009-05-21 19:31:22 +0000
  • 8515247575 Adding some functions I keep reinventing, especially for testing purposes. aaron 2009-05-21 19:30:44 +0000
  • e6200fe5b5 don't ignore reads when maxReadLength isn't set also, print out LOD score for cleaning ebanks 2009-05-21 19:24:10 +0000
  • 0219d33e10 QualityUtils: added reverse function to reverse an array of bytes (and not complement it), BaseUtils: split qualToProb into itself and qualToErrProb, CovariateCounterWalker and LogisticRecalibrationWalker: several changes including a properly acocunting (only partly complete) for reversing AND complementing bases that are negative strand, PrintReadsWalker: created option to output reads to a BAM file rather than just to the sceern (useful for creating a downsampled BAM file) andrewk 2009-05-21 18:30:45 +0000
  • 7e77c62b49 auxiliary class, a simple struct to keep together info like numbers of covered, assessed, ref/variant bases across the sample asivache 2009-05-21 16:30:16 +0000
  • 7e5e422591 ReferenceOrdereData now inspects the ROD class using reflection. If the ROD declares a static Iterator<ROD> createIterator(String rodName, File rodFile) factory method, it is wrapped and used by the ReferenceOrderedData to read records from rodFile. If the ROD does not provide such factory method, the old behavior is the default: ReferenceOrderedData uses its own simple default iterator to read the file line by line (assuming there is only one line per record/position). asivache 2009-05-21 15:23:22 +0000
  • 26dd3cd50e Cleanup. Move filtering functions closer to where they're used. hanna 2009-05-20 21:42:48 +0000
  • e7a6f8cdc4 Removed evidence of a previous incarnation of data sharding. hanna 2009-05-20 20:48:33 +0000
  • 3cad580655 Catch and rethrow the walker's required argument, so that command-line arguments will be displayed when the GATK throws an argument exception. hanna 2009-05-20 19:17:16 +0000
  • dc748d9c9c Integrate more feedback on command-line argument system. Focus on help formatter: separate required from optional but otherwise keep ordering the same, reorder GATK arguments by usage. hanna 2009-05-20 19:01:25 +0000
  • 34f9820299 update mapping quality score and edit distance attribute for reads when they are cleaned ebanks 2009-05-20 17:51:31 +0000
  • 57918de753 add the @Requires for this walker ebanks 2009-05-20 17:03:12 +0000
  • 747521c849 Fixed the simplest of typos. kiran 2009-05-20 16:00:30 +0000
  • e48078b476 Updated to reflect change to BasecallingReadModel constructor. kiran 2009-05-20 15:43:26 +0000
  • 505f588768 Forgot to say that the mate is unmapped too. This is necessary to prevent SAM-JDK from yelling at me about an invalid SAM file. kiran 2009-05-20 15:38:51 +0000
  • 96e73e496a Delete deprecated old-school traversals. hanna 2009-05-20 14:57:17 +0000
  • 3b1f84e15b Slightly improved interface to merging utility for multiple bam files depristo 2009-05-20 12:54:41 +0000
  • b840dd1320 Added some code to change the instrumentation for tests. aaron 2009-05-20 05:15:27 +0000
  • c34eaa6962 add javassist, which is a less lower level version of bcel. aaron 2009-05-20 05:11:03 +0000
  • 6c5fbb988b Now basecalls an entire read (both ends of the pair, barcode... everything) at once. After, RawRead and FourProbRead can be asked to return a specified subset (corresponding to the ranges specified for each end of the read. kiran 2009-05-20 00:09:20 +0000
  • e293d65ede Refactored to allow the user to specify the range of cycles they wish to call. Simply specify a single range (i.e. '0-75') or two ranges ('0-75,76-151'). This allows single and paired-end read processing to coexist happily. Also implements annotation of an aligned bam file (which should hopefully fit in under two gigs now, but I'm waiting on a bug fix or a clarification from the Picard team. kiran 2009-05-20 00:07:24 +0000
  • 08c9f4d86b Renamed to BasecallingTrainer. kiran 2009-05-20 00:03:46 +0000
  • 01a3cb27c7 @Required / @Allows flags for main arguments. hanna 2009-05-19 23:26:17 +0000
  • 40dbc21df7 Moved ParseException to it's own file and made it public. kiran 2009-05-19 14:42:44 +0000
  • ff798fe483 Reintroduce support for interval-based traversals. hanna 2009-05-18 22:54:18 +0000
  • e9f85ef920 Better merge support depristo 2009-05-18 21:18:51 +0000
  • 3441795d9c better handling of edge cases (zero coverage, reference mistakes, etc.) jmaguire 2009-05-18 18:04:37 +0000
  • 7c615c8fb0 Some changes to the system for annotating a pre-aligned bam file. Doesn't fit within 2gigs. kiran 2009-05-18 17:42:08 +0000
  • a39c8839c8 print percentage sign! asivache 2009-05-18 14:38:20 +0000
  • 9dec783a82 Actually writes out a good header now depristo 2009-05-18 13:34:52 +0000
  • c10741e9f5 Rename TraverseLociByReference to TraverseLoci to represent its new function. hanna 2009-05-18 01:31:57 +0000
  • e6ce80c8e3 Fix for GSA-44...don't throw exception when user specifies -h. hanna 2009-05-18 00:42:00 +0000