25365ccb07Adding more walkers to the base GATK package.
hanna
2009-07-14 20:40:52 +0000
efcbb16688un-deprecate this ROD and make it implement Genotype
ebanks
2009-07-14 19:45:41 +0000
84d407ff3fFixing odd merge problem with VariantEval -- better cluster analysis (no cumsum), rodVariant is now an AllelicVariant
depristo
2009-07-14 18:53:27 +0000
76b09a879bDisplay a more intelligent error message if the user runs a locus traversal across an unmapped reads file.
hanna
2009-07-14 18:36:09 +0000
99ddd8ab15bug fix for transitioning between chromosomes in GLF output
aaron
2009-07-14 17:58:04 +0000
7d755a4c90GenotypeLikelihoods doesn't emit metrics, they don't make sense
aaron
2009-07-14 17:22:28 +0000
01fc8da270adding the GenotypeLikelihoodsWalker, which generates GLF genotype likelihoods that are pretty much identical to the samtools calls.
aaron
2009-07-14 16:57:18 +0000
99f9cd84edWarning for possibly mismatched reads / reference was very aggressive. Relax the criteria a bit.
hanna
2009-07-14 16:21:22 +0000
12b5d9c70cThe number of loci can easily overflow an int. Change reduce type to a Long.
hanna
2009-07-14 16:07:00 +0000
5bf76474980.2.3 -- now preserves Q0 bases throughout the reads
depristo
2009-07-14 12:27:31 +0000
36819ed908Initial changes to the SSG to output GLF by default
aaron
2009-07-14 08:46:04 +0000
0f6bfaaf73Skip validation in case of no reads aligning.
hanna
2009-07-14 02:03:36 +0000
a1d33f8791-Added walker to dump strand test results to file -Refactored strand filter to handle calls from the walker
ebanks
2009-07-14 01:56:50 +0000
bfe90af5e2Some quick and dirty fixes to support querying unmapped BAM files.
hanna
2009-07-14 01:25:20 +0000
e4152af387added a big speed-up for interval list input processing. With large interval sets this was taking way too long...
aaron
2009-07-13 22:00:00 +0000
9f0fb9f3aaFix for GSA-90: GATK banner and error messages should point to the wiki website.
hanna
2009-07-13 21:56:41 +0000
b18caa2052Fix for GSA-90: System isn't failing with an error when you use the wrong reference.
hanna
2009-07-13 20:42:12 +0000
52659d02d4ignore unmapped reads in all the indel walkers (since they're giving me overhead issues)
ebanks
2009-07-13 16:51:11 +0000
5c321f9630Oops! Accidentally deactivated the ArgumentFactory, needed by the CleanedReadInjector, while refactoring last night.
hanna
2009-07-13 16:41:55 +0000
b61f9af4d7Cleaning up, preparing to incorporate a better fix for Eric's problems with validation stringency in BAM files opened directly from the walkers.
hanna
2009-07-13 01:42:13 +0000
4c02607297genotyper also needs to have 454 reads filtered out
ebanks
2009-07-10 23:19:28 +0000
dea72c576euse the filter to ignore 454 reads in the traversal to speed up cleaning (since there's less area to actually clean against)
ebanks
2009-07-10 18:34:44 +0000
0070b8ea6aUntil 454 goes far, far away, at least we can completely ignore it
ebanks
2009-07-10 18:31:53 +0000
1401606344move warning about strictly adjacent intervals in a contig from 'remap' to 'read', so it is issued only once
asivache
2009-07-10 17:58:11 +0000
aa4f60d980Make sure that only reads marked as 'mapped' are filtered based on validity of alignment.
hanna
2009-07-10 17:44:06 +0000
e01d37024anow updates mapping quality (to an arbitrary chosen value of 37 if the resulting mapping is unique) and X0, X1 tags after remapping (in REDUCE mode)
asivache
2009-07-10 16:40:52 +0000
a1eb128377few more detailed debug printouts conditioned on if (DEBUG), so no real changes...
asivache
2009-07-10 16:36:57 +0000
08c4fb86e3Derive examples from real data.
hanna
2009-07-10 04:21:37 +0000
03e1713988Better support for specifying read filters to apply directly from the walkers.
hanna
2009-07-09 23:59:53 +0000
ce08f5f0c3Removed some unused variables, fixed some javadoc. The usual.
aaron
2009-07-09 22:10:22 +0000
9cfd89c54fa small refactoring, and some documentation cleanup
aaron
2009-07-09 22:03:45 +0000
d86717db93Refactoring of the traversal engine base class, I removed a lot of old code.
aaron
2009-07-09 21:57:00 +0000
3519323156Output the correct geli text format
ebanks
2009-07-09 19:45:18 +0000
99631cdaa1fix and then deprecate the rodGELI class (GELIs suck)
ebanks
2009-07-09 19:18:13 +0000
60a86fb34aBetter handling of fasta files with non-standard extensions.x
hanna
2009-07-09 18:18:48 +0000
5e26770634Hack the MicroScheduler to be tolerant of RefWalkers. We need to implement a longer-term solution to make it easier for datasources to report problems they've encountered along the way (GSA-103).
hanna
2009-07-09 17:26:59 +0000
3fe7104963Added walker to filter out clustered SNPs from a call set
ebanks
2009-07-09 03:16:27 +0000
8ee5c7de8eGLF reader and writer check in.
aaron
2009-07-08 23:06:37 +0000
c8fcecbc6fAdded ParseDCCSequenceData.py to repository and made changes that allow an analysis of quantity of sequence data by platform and project, moved table / record system to a new module called FlatFileTable.py and built that into ParseDCCSequenceData and CoverageEval.py; changed lod threshold in CoverageEvalWalker.
andrewk
2009-07-08 22:04:26 +0000
3f0304de5aGet rid of unused iterator.
hanna
2009-07-08 20:39:16 +0000
da4d26b1eaEnum support for command-line argument system, and some cleanup for hacks to the CleanedReadInjector that were required because Enum support was missing.
hanna
2009-07-08 20:26:16 +0000
aacec3aeb0rod for binary GELI files (still needs to be tested)
ebanks
2009-07-08 20:25:56 +0000
e106cf73d8A quick change to provide more verbose output.
aaron
2009-07-08 19:08:19 +0000
433ad1f060Cleanup...deprecate FastaSequenceFile2 in favor of IndexedFastaSequenceFile or ReferenceSequenceFile from Picard, depending on the application.
hanna
2009-07-08 18:49:08 +0000
d8fbb2b62cRefactoring; make a better home for the MalformedReadFilteringIterator.
hanna
2009-07-08 16:54:20 +0000
c78a72e775Applies Fisher's Exact Test to determine whether there's a strand bias and, if so, filters the call out.
kiran
2009-07-08 16:14:11 +0000
b211f500a3Applies secondary base feature to variants.
kiran
2009-07-08 16:13:29 +0000
6e31057e6bSome changes involving output of marginal calls to different, per-filter files.
kiran
2009-07-08 16:12:57 +0000
787c84d68bonly compare pair position for paired end reads
ebanks
2009-07-08 04:07:08 +0000
d3daecfc4dAdded unit tests for function in ListUtils to randomly sample lists with replacement, updated AlleleFrequencyEstimate to provide a callType of HomRef, HetSNP, HomSNP, update indices in CoverageEval.py, and made a lot of changes to CoverageWalker biggest one being that it directly calls SingleSampleGenotyper instead of implementing some parts of SSG itself.
andrewk
2009-07-08 02:05:40 +0000
4ba2194b5eFilter reads whose alignment starts past the end of the contig to which it allegedly aligns.
hanna
2009-07-07 22:27:44 +0000
194b75613bFix compile problem with unit tests.
hanna
2009-07-07 20:29:31 +0000
1d84c9da96sortByRef now supports x:y location syntax
depristo
2009-07-07 16:42:40 +0000
1db15ee468made some things protected so that I can inherit them in MultiSampleCallerAccuracyTest
jmaguire
2009-07-07 15:50:28 +0000
1fa71aa31dNow outputs stats. Doesn't do the downsampling thing because I think I'll have enough counts.
jmaguire
2009-07-07 15:29:31 +0000
5d7393d7cbTemporary fix for Eric's problems with SOLiD reads: make sure the command-line argument system takes the --validation-strictness command-line argument into account when creating SAMFileReaders.
hanna
2009-07-07 15:18:05 +0000
f6a273a537other fixes for some broken unit tests
aaron
2009-07-07 05:53:13 +0000
033bafe7a1fixed sam by reads test for the new filtering code
aaron
2009-07-07 05:45:50 +0000
2a86f2f833an initial pass at the GLF reader, and some other genotype changes to phase out the LikelihoodObject I created.
aaron
2009-07-07 04:30:27 +0000
5735c87581Basic infrastructure for filtering malformed reads.
hanna
2009-07-06 22:50:22 +0000
b9d533042eTwo-tailed HardyWeinberg test implemented. VariantEval now separate violations from summary outputs for clarity; Fixing problems with CovariateCounterTest and TabularRodTest
depristo
2009-07-06 22:02:04 +0000
31313481f6Temporary patch to filter out bad alignments that aren't quite fully reported as bad.
hanna
2009-07-06 18:41:55 +0000
6580211c2aFirst version of depth of coverage filter. Right now it takes in a maximum coverage threshold given by the user.
mmelgar
2009-07-06 18:22:46 +0000
fac7ac5142Don't print out 0 coverage (which is always 0)
ebanks
2009-07-06 17:44:32 +0000
d19366eaadCleanup emergency fixes for out-of-bounds issues in reference retrieval. Fix spelling mistakes.
hanna
2009-07-06 15:41:30 +0000
338cdbebaddeal with screwy solid reads in the cleaner (no cigar strings)
ebanks
2009-07-05 16:49:58 +0000
8bcbf7f18aFirst draft of multi sample caller accuracy test.
jmaguire
2009-07-05 16:29:13 +0000
4019cd2bd7Added ROD for parsing hapmap3 genotype files. Tweak to TabularROD to allow HapMapGenotypeROD to work. Added HapMapGenotypeROD to list of RODs in ReferenceOrderedData.java. Modified MultiSampleCaller to return a single object with most of the relvant information.
jmaguire
2009-07-05 16:28:24 +0000
e5e249d4actemporary fix to deal with screwy SOLiD reads
ebanks
2009-07-05 03:25:57 +0000
cf1854b339Fix for monsterous problems with solid data -- now can dynamically expand recalibration tables on the fly as reads declare additional read groups -- use assumeFaultyHeader flag
depristo
2009-07-03 17:15:49 +0000
0d00823332Fix for performance bug in extending the read with X's in cases where the read is aligned off the end of the contig.
hanna
2009-07-03 16:17:38 +0000
be2f8478c0added supression of failure messages
kcibul
2009-07-03 15:19:37 +0000
dcb8892568Lot of code for coverage evaluation tools including first version of python script to evaluate the downsampled SSG callls made and the java code to make all the calls at Hapmap chip sites at various downsampling levels; ListUtils contains functions for randomnly subsetting lists (with replacement) which are useful for subsetting the same elements in both the reads and the offsets lists of a LocusWalker
andrewk
2009-07-03 08:07:02 +0000
d0cef5ff9dOops. Specified incorrect classname in packgae for depth of coverage walker.
hanna
2009-07-02 21:40:40 +0000
d603145cb0Meaning of input arguments has CHANGED: minFraction is now a minimum fraction of CONSENSUS indel observation, out of all reads covering the site, required to make the call. minConsensusFraction is still the minimum fraction of CONSENSUS indel observation out of all indel observations at the site
asivache
2009-07-02 20:38:10 +0000
62807139fcCleanup pileup and depth of coverage in preparation for release. Add pileup, depth of coverage, and print reads to package for distribution.
hanna
2009-07-02 14:54:01 +0000
6a25f0b9c5refactored into new package
kcibul
2009-07-02 14:37:54 +0000
1c83b4d949forgot to take out some test code
aaron
2009-07-02 14:18:37 +0000
bc17ff567aWhen you get the reference string for a read that is mapped partially off the end of a contig, the string is masked with X's for base positions without corresponding reference positions. Now with a test case!
aaron
2009-07-02 14:15:50 +0000
47cb9f169eStable tool that's the reverse of merging -- splits a file into individual BAM files, one for each sample ID in the SAM header
depristo
2009-07-02 12:56:46 +0000
6684cb8bc9copySamFileHeader() utility function
depristo
2009-07-02 12:55:51 +0000
bb92eb8b1cadded a fix for overlapping reads in the locus context
aaron
2009-07-02 02:08:59 +0000
6570ce0b5badded the example files to the distro
aaron
2009-07-01 23:16:42 +0000
9d659199f3bam and fasta files generated with the artificial tools. These will be included in the GATK distro.
aaron
2009-07-01 22:49:31 +0000
d4d3af20f2made a fake fasta generator, so we can now generate a complete bam / fasta combo of made up data.
aaron
2009-07-01 21:35:34 +0000
c2e5a68aafoutput format changed in --verbose --somatic mode: now also prints the <#reads with indels>/<coverage> for normal samples, rather than only for the tumor; also, code cleaned up a little
asivache
2009-07-01 20:56:16 +0000
4cbf069de1First version of coverage evaluation tool
andrewk
2009-07-01 20:52:25 +0000
7462f3f344Bug in setContig() fixed: sequence dictionary's .getSequences().contains() and .getSequences().indexOf() do NOT work when applied to contig names (Strings), since getSequences() returns a list of SAMSequenceRecord's; changed to querying the dictionary directly for specified contig name
asivache
2009-07-01 20:50:09 +0000
76fd4b3848deal with different contigs
ebanks
2009-07-01 19:17:27 +0000
20fab507a8Choose the REF if it scores equal to consensus!
ebanks
2009-07-01 18:54:27 +0000
9b182e3063Prep for documenting command-line arguments: delete some arguments that don't make sense any more given the state of the traversals and GATK input requirements: all_loci (replaced by walker annotation), max OTF sorts (bam files must be sorted and indexed), threaded io (replaced by data sharding framework).
hanna
2009-07-01 18:23:35 +0000
5a5103cfd2Heads up, everyone: command-line args no longer need to be public.
ebanks
2009-07-01 16:09:22 +0000
b43d4d909eFix CleanedReadInjectorTest to work with new CleanedReadInjector.
hanna
2009-07-01 15:48:06 +0000