c7b032cc88missed a file in the add.
aaron
2009-05-27 18:27:38 +0000
3c3cd5bb64Moving some of the data sharding around. A new shard catagory now exits, INTERVAL. This saved a lot of code that was mirroring the same approach in both the read and locus shard strategies.
aaron
2009-05-27 18:24:31 +0000
99524ab6d0package name corrected
asivache
2009-05-27 18:20:43 +0000
b76f8c4eb5moved from playground to gatk
asivache
2009-05-27 18:18:33 +0000
c3678c7bb9moved from playground to gatk
asivache
2009-05-27 18:18:08 +0000
5b310e48f5changed to use factored out Transcript class; some docs added (not much)
asivache
2009-05-27 18:17:23 +0000
41c1a62ac4formerly private class, factored out and made public. Represents a transcript annotation (transcript id, genomic location, genomic intervals for all exons present in this transcript, etc)
asivache
2009-05-27 17:52:38 +0000
8edba13dedUnit tests for the reference views. Partially addresses GSA-25.
hanna
2009-05-27 17:49:45 +0000
9bd6489f8eOutput indels in the format appropriate for low-coverage indel submission
ebanks
2009-05-27 17:32:15 +0000
3098ed091cchecking in new folder for perl scripts AND a simple script that takes an input text file and reference dictionary (.fai) and performs stable sort of the input lines according to the contig order specified by the dictionary. Position of the contig filed to sort on in the input lines is specified as --k POS option. Input lines may specify contigs that are not in the dictionary, in this case the additional contigs will be added at the end of the sorted output, after all known contigs. The sorting order between these additional contigs is simply the order in which they first appear in the input
asivache
2009-05-27 16:34:55 +0000
919e995b7f-Moved my walkers to indels directory -Removed entropy walker and replaced it with mismatch (column) walker -Some improvements to the cleaner (more to come)
ebanks
2009-05-27 16:34:24 +0000
df8490a0cfRemove unused dependency on commons logging.
hanna
2009-05-27 14:12:26 +0000
864a1e81e3Delete stale class from previous rethink of the traversal engine.
hanna
2009-05-27 13:52:03 +0000
6fab1a64faStarted work on GLF input / output basics. Do not use.
aaron
2009-05-26 22:49:59 +0000
b81135c606bug fixed; this rod seems to work now...
asivache
2009-05-26 22:25:34 +0000
c72601322anow returns the farm id when submitting a job!
depristo
2009-05-26 22:23:24 +0000
a488d2dbb2Lazy creation of output streams. Only create output streams when absolutely necessary.
hanna
2009-05-26 21:56:57 +0000
ab7bb5800aforgot to remove debug print statement
asivache
2009-05-26 21:38:27 +0000
568a0d3c27exon coordinates are now parsed correctly (?). IF DELIMITER IS THE LAST CHARACTER IN A STRING, String.split() DOES NOT return empty field as the last one; instead, the last field returned will be the one immediately before such delimiter! Wicked.
asivache
2009-05-26 21:36:50 +0000
f4119c17destill working on it...
asivache
2009-05-26 21:07:38 +0000
d73f2e95ccrefseq added to the list of known rod types
asivache
2009-05-26 21:06:44 +0000
23b7a28015simple walker that works off pre-computed tumor/normal genotyping calls (e.g. samtools pileup). Collects overal stats and also writes somatic variants into IGV-compatible bed file if asked to. NOT finished. NOT tested
asivache
2009-05-26 21:05:47 +0000
d994544c47Added back end code support for Sharding based on genomic location for reads. Changed the sharding code to take GenomeLocSortedSet instead of a list<GenomeLoc>, and added a bunch of much simplier and cleaner test cases.
aaron
2009-05-26 20:57:46 +0000
4edcdffe45refseq annotation track: should be able to provide (multiple) transcript annotations available over a given genomic position. NOT finished and NOT tested!
asivache
2009-05-26 20:07:15 +0000
c2df35b7fe- get leftmost position of indel correct - don't try to clean reads with mapping quality of 0 - un-deprecate
ebanks
2009-05-26 17:24:58 +0000
54bb643d19Validated Mark's assertion that GSA-27 is fixed. Also did some cleanup on the pileup walker so that it doesn't output to System.out.
hanna
2009-05-26 15:58:21 +0000
008d677beaFixed ValidatingPileup to work with Andrey's new rodSAMPileup -> GenotypeList type hierarchy. Fixed reference-ordered data validation system to validate class hierarchies instead of specific class types.
hanna
2009-05-23 20:50:28 +0000
d056f9f3e8Changed the name to reflect the sorted nature of the set, added some fixes
aaron
2009-05-22 22:34:24 +0000
831d430025Added a collection for storing GenomeLocs, that also has functions for removing by genomic region (that may span multiple GenomeLoc's in the collection), and adding regions, which are then merged with any overlapping regions.
aaron
2009-05-22 21:52:40 +0000
34413362fdBugfix: handle case where queue is empty.
hanna
2009-05-22 21:45:22 +0000
ec2e8d5726Fixes for getting ValidatingPileup running in parallel.
hanna
2009-05-22 21:20:24 +0000
cd80e3f372Replaced dumb training function with a version that creates a training set slightly more sensibly.
kiran
2009-05-22 19:34:33 +0000
02c0afdb85Added the ability to specify the sorted, unaligned bam and/or the sorted, aligned bam such that broken computations can be restarted.
kiran
2009-05-22 19:33:34 +0000
454a6d1df7Fixed an egregious error in simpleReverseComplement wherein the RC'd string would be composed entirely of the last base.
kiran
2009-05-22 19:32:20 +0000
2a5be1debeCleanup in datasources.providers namespace. Make it easier for others writing traversal engines to use.
hanna
2009-05-22 19:12:00 +0000
02fc4f145frefactoring: a couple of general purpose (hopefully useful?) methods/classes extracted into a standalone utils class
asivache
2009-05-22 18:54:40 +0000
4b718688d5no changes, really, just synchronizing (instead of reversing) to increase the amount of entropy
asivache
2009-05-22 17:27:28 +0000
a9dfbfb309internal changes and some refactoring. slightly different final report. Now can take tracks that implement either Genotype or GenotypeList; takes an arg specifying what variants to look for (POINT - aka snp - or INDEL); takes an arg specifying whether default ref/ref call of one type (INDEL/POINT) should be implicitly assumed if another call (POINT/INDEL respectively) was made at the same position [this is probably most useful for indels and only (?) for sam pileups: if we have only point mutation call at a given position, it does mean that we do have coverage, and that there was no evidence whatsoever for an indel, so we have an implicit 'no-indel' call]
asivache
2009-05-22 17:25:09 +0000
d5bb4d9ba9Auxiliary class that can read one line from samtools pileup file. Used by rodSAMPileup to read pairs of lines as needed. NOTE: this class implements Genotype and (a trivial) GenotypeList, but it is NOT a rod!
asivache
2009-05-22 17:20:01 +0000
732fed9aadALERT, ALERT! rodSAMPileup is now a GenotypeList, not a Genotype! Now it can intelligently read full samtools pileup files (containing, in general, both point and indel genotypes at the same position). No need to split/synchronize pileups from different individuals anymore, hooray!
asivache
2009-05-22 17:17:59 +0000
26633957d9Genotype interface is extended: now it requires implementing object to be able to tell whether it isPointGenotype() or isIndelGenotype() (and the contract requires, e.g. alleles to be represented differently)
asivache
2009-05-22 17:14:46 +0000
d9fc84f1e3actually checking in the first pass
depristo
2009-05-22 17:13:27 +0000
8773b3a430a trivial wrapper interface for the objects capable of holding 'full' genotype, i.e. both point (as in ref/snp) and indel variants at the same reference position
asivache
2009-05-22 17:12:01 +0000
7a979859a9Intermediate checking for evaluation -- now supports transition / transversion evaluation
depristo
2009-05-22 17:05:06 +0000
f2ea193149For some reason the apostraphes in the comments were throwing annoying compile-time warnings: "unmappable character for encoding UTF8"
ebanks
2009-05-22 14:07:07 +0000
9902ce8073properly flush the gzip output stream. this was a subtle inheritance bug.
jmaguire
2009-05-22 13:57:58 +0000
63caca31bfminor update in report printout format
asivache
2009-05-22 13:56:09 +0000
7afc10fd6fupdated, reports more stuff now, including stats for external consistency checks
asivache
2009-05-21 22:28:18 +0000
30c63daf89More improvements to the duplicate quality combiner, making progress towards a clean system
depristo
2009-05-21 22:26:57 +0000
04e51c8d1dBetter version of MergeBAMBatch -- more options for creating the file
depristo
2009-05-21 22:26:19 +0000
65995887fcReleasable version of the Pileup walker
depristo
2009-05-21 22:25:37 +0000
dc17a5661dBetter accessors for dealing with second base prob pileups
depristo
2009-05-21 22:25:16 +0000
d261459c48Useful function to create a string with N copies of a same char
depristo
2009-05-21 22:23:52 +0000
287bb52e81Refreshes the mount points that we'll be using (so that the program will play nicely with LSF).
kiran
2009-05-21 20:36:12 +0000
b5ad5176f7stick headers on the output tables
jmaguire
2009-05-21 20:35:50 +0000
83e1454a11Added a method to determine the fraction of a sequence that's taken up by the most frequent base.
kiran
2009-05-21 20:35:31 +0000
bdf772f017Added test for determining the fraction of a sequence that's taken up by the most frequent base (quick-and-dirty homopolymer testing).
kiran
2009-05-21 20:35:08 +0000
d61a5261c1Better integration of reference-ordered data into the data sharding system.
hanna
2009-05-21 20:09:32 +0000
0d58e4ccc9-check original alignments for indels when computing mismatch score -move logging to debug
ebanks
2009-05-21 19:55:42 +0000
5f67914b08Added loads of documentation.
kiran
2009-05-21 19:40:47 +0000
1a9d5cea29Added a method to reverse-complement a String object, preserving 'N' and '.' bases.
kiran
2009-05-21 19:39:39 +0000
1a3ca97d29remove the ivy command for dependency on BCEL, we're not using it right now.
aaron
2009-05-21 19:35:53 +0000
a687c6bc03Added a method to refresh an NFS mount point (necessary to prevent NFS flakiness when running on the LSF farm.
kiran
2009-05-21 19:31:54 +0000
324ef9cbd1Test class for PathUtils.
kiran
2009-05-21 19:31:22 +0000
8515247575Adding some functions I keep reinventing, especially for testing purposes.
aaron
2009-05-21 19:30:44 +0000
e6200fe5b5don't ignore reads when maxReadLength isn't set also, print out LOD score for cleaning
ebanks
2009-05-21 19:24:10 +0000
0219d33e10QualityUtils: added reverse function to reverse an array of bytes (and not complement it), BaseUtils: split qualToProb into itself and qualToErrProb, CovariateCounterWalker and LogisticRecalibrationWalker: several changes including a properly acocunting (only partly complete) for reversing AND complementing bases that are negative strand, PrintReadsWalker: created option to output reads to a BAM file rather than just to the sceern (useful for creating a downsampled BAM file)
andrewk
2009-05-21 18:30:45 +0000
7e77c62b49auxiliary class, a simple struct to keep together info like numbers of covered, assessed, ref/variant bases across the sample
asivache
2009-05-21 16:30:16 +0000
7e5e422591ReferenceOrdereData now inspects the ROD class using reflection. If the ROD declares a static Iterator<ROD> createIterator(String rodName, File rodFile) factory method, it is wrapped and used by the ReferenceOrderedData to read records from rodFile. If the ROD does not provide such factory method, the old behavior is the default: ReferenceOrderedData uses its own simple default iterator to read the file line by line (assuming there is only one line per record/position).
asivache
2009-05-21 15:23:22 +0000
26dd3cd50eCleanup. Move filtering functions closer to where they're used.
hanna
2009-05-20 21:42:48 +0000
e7a6f8cdc4Removed evidence of a previous incarnation of data sharding.
hanna
2009-05-20 20:48:33 +0000
3cad580655Catch and rethrow the walker's required argument, so that command-line arguments will be displayed when the GATK throws an argument exception.
hanna
2009-05-20 19:17:16 +0000
dc748d9c9cIntegrate more feedback on command-line argument system. Focus on help formatter: separate required from optional but otherwise keep ordering the same, reorder GATK arguments by usage.
hanna
2009-05-20 19:01:25 +0000
34f9820299update mapping quality score and edit distance attribute for reads when they are cleaned
ebanks
2009-05-20 17:51:31 +0000
57918de753add the @Requires for this walker
ebanks
2009-05-20 17:03:12 +0000
747521c849Fixed the simplest of typos.
kiran
2009-05-20 16:00:30 +0000
e48078b476Updated to reflect change to BasecallingReadModel constructor.
kiran
2009-05-20 15:43:26 +0000
505f588768Forgot to say that the mate is unmapped too. This is necessary to prevent SAM-JDK from yelling at me about an invalid SAM file.
kiran
2009-05-20 15:38:51 +0000
96e73e496aDelete deprecated old-school traversals.
hanna
2009-05-20 14:57:17 +0000
3b1f84e15bSlightly improved interface to merging utility for multiple bam files
depristo
2009-05-20 12:54:41 +0000
b840dd1320Added some code to change the instrumentation for tests.
aaron
2009-05-20 05:15:27 +0000
c34eaa6962add javassist, which is a less lower level version of bcel.
aaron
2009-05-20 05:11:03 +0000
6c5fbb988bNow basecalls an entire read (both ends of the pair, barcode... everything) at once. After, RawRead and FourProbRead can be asked to return a specified subset (corresponding to the ranges specified for each end of the read.
kiran
2009-05-20 00:09:20 +0000
e293d65edeRefactored to allow the user to specify the range of cycles they wish to call. Simply specify a single range (i.e. '0-75') or two ranges ('0-75,76-151'). This allows single and paired-end read processing to coexist happily. Also implements annotation of an aligned bam file (which should hopefully fit in under two gigs now, but I'm waiting on a bug fix or a clarification from the Picard team.
kiran
2009-05-20 00:07:24 +0000
08c9f4d86bRenamed to BasecallingTrainer.
kiran
2009-05-20 00:03:46 +0000
01a3cb27c7@Required / @Allows flags for main arguments.
hanna
2009-05-19 23:26:17 +0000
40dbc21df7Moved ParseException to it's own file and made it public.
kiran
2009-05-19 14:42:44 +0000
ff798fe483Reintroduce support for interval-based traversals.
hanna
2009-05-18 22:54:18 +0000
e9f85ef920Better merge support
depristo
2009-05-18 21:18:51 +0000