fd20f5c2e8For a file or files backed by a ROD implementing AllelicVariant, outputs a VCF file summarizing the information. Metadata like Hapmap and dbSNP membership, genotype LOD, read depth, etc, are annotated appropriately. The results output by this program are equivalent to those given by Gelis2PopSNPs.py.
kiran
2009-09-08 06:12:18 +0000
4a95f2181dprint out the right variant
ebanks
2009-09-08 01:37:35 +0000
5791da17aeUpdated to reference HLA database of unique 4 digit alleles
sjia
2009-09-07 22:12:56 +0000
e716f9337dA few more additions; almost done...
ebanks
2009-09-07 01:50:22 +0000
5dbba6711cLots of changes: (I'll send email out in a sec) 1) Moved various disparate concordance / set splitting functionalities to a new parent tool which works like VariantFiltration (i.e. people can write various modules that fit inside and can be run though it). 2) Fixed up argument parsing in VariantFiltration to use key=value format so we don't accidentally mox up values (like I had been doing). 3) Have indel rod print samples
ebanks
2009-09-07 01:12:09 +0000
1c3d67f0f3Improvements to the CountCovariates and TableRecablirator, as well as regression tests for SLX and 454 data
depristo
2009-09-04 22:26:57 +0000
2b0d1c52b2General WalkerTest framework. Includes some minor changes to GATK core to enable creation of true command-line like GATK modules in the code. Extensive first-pass tests for SSG
depristo
2009-09-04 19:13:37 +0000
0cc634ed5d-Renamed rodVariants to RodGeliText -Remove KGenomesSNPROD -Remove rodFLT -Renamed rodGFF to RodGenotypeChipAsGFF -Fixed a problem in SSGenotypeCall -Added basic SSGenotype Test class -Make VCFHeader constructors public
aaron
2009-09-04 18:40:43 +0000
fd1c72c151Fixed package name
ebanks
2009-09-04 15:40:06 +0000
82d99cbe43Remove dir
ebanks
2009-09-04 15:13:02 +0000
6c476514f8Moved to core. Wiki pages are going up; unit tests will be written soon.
ebanks
2009-09-04 15:09:11 +0000
42c71b4382Fix for Kris: now SNPs aren't masked by default (only when they come from a mask rod) and we can design Sequenom validation assays for them. I'll move this all to core in a bit...
ebanks
2009-09-04 14:52:06 +0000
849dce799dThis rod was all wrong for generating the alternate snp alleles (it returned null or even the wrong value); fixed.
ebanks
2009-09-04 14:21:46 +0000
a08c68362eRenaming error to getNegLog10PError(); added Cached clearing method to GL; SSG now has a CallResult that counts calls; No more Adding class to System.out, now to logger.info; First major testing piece (and general approach too) to unit testing of a walker -- SingleSampleGenotyper now knows how many calls to make on a particular 1mb region on NA12878 for each call type and counts the number of calls *AND* the compares the geli MD5 sum to the expected one!
depristo
2009-09-04 12:39:06 +0000
3c2ae55859changes for the genotype overhaul. Lots of changes focusing on the output side, from single sample genotyper to the output file formats like GLF and geli. Of note the genotype formats are still emitting posteriors as likelihoods; this is the way we've been doing it but it may change soon.
aaron
2009-09-04 05:31:15 +0000
2241173fffIn order to help learn python, I decided to convert Michael's DoC python script to Java; the CoverageHistogram now spits out standard deviations for a good Gaussian fit. This code eventually needs to end up in the VariantFiltration system - when we are ready to parameterize on the fly.
ebanks
2009-09-04 02:23:57 +0000
544900aa99Migration of some core calculations (log-likelihood probabilties, etc.) from CoverageAndPowerWalker into static methods in PoolUtils
chartl
2009-09-03 21:43:29 +0000
c849282e44reverting the HLA walker changes
aaron
2009-09-03 19:11:57 +0000
5202d959bfNM attribute changed in sam jdk (?) from Integer to Short, or maybe it is presented differently by the reader depending on whether SAM or BAM is processed; in any case, both Integer and Short are safe now
asivache
2009-09-03 19:03:32 +0000
ada4c5a13cSmall change to debug printing code
sjia
2009-09-03 18:31:21 +0000
c3aaca1262Improvements to make this work with uncompressed fastq files. Pulled the fastq parser out into it's own SAMFileReader-like entity.
kiran
2009-09-03 17:20:16 +0000
499b3536a4Changed to use AlignmentUtils.isReadUnmapped() for better consistency with SAM spec; also, it is now explicitly enforced that unmapped reads have <NO_...> values set for ref contig and start upon "remapping"
asivache
2009-09-03 16:45:07 +0000
61d4dd4d01Remove playground version
ebanks
2009-09-03 15:45:26 +0000
5bd99fc1c4VariantFiltration moved to core. Another win for the team.
ebanks
2009-09-03 15:41:41 +0000
3ac5ac066fChecking in Michael's DoC parameterization script; this functionality will eventually be moved into VariantFiltration
ebanks
2009-09-03 15:07:49 +0000
515fc7c476overaggressively removed the STD outputs, back in for tests
aaron
2009-09-03 15:07:45 +0000
7d0a13d711added options for building with xml output to files
aaron
2009-09-03 15:00:27 +0000
d804a119dcscript to run the complete pilot2 pipeline: from cleaning to calling to filtering [not quite finished though]
ebanks
2009-09-03 14:35:55 +0000
bdd0a6f9fachange to make build work
depristo
2009-09-03 13:43:10 +0000
b01ac9de0cHigh performance LocusIterator implementation. Now with greatly reduced memory impact and 2x (and more potentially) speed ups of raw locus iteration. General performance improvements to SSG with empirical probs. You can enable high-performance locus iteration with the -LIBS arg. It's still testing but passes validing pileup.
depristo
2009-09-03 03:06:25 +0000
e2780c17afCheckin of the Multi-Sample SNP caller.
jmaguire
2009-09-03 00:23:28 +0000
e2a79c5cd9Checkpoint. The BWT that we generate now matches the first 16% of the BWT that BWT-SW generates. Cleaned up output streams to separate the byte packing / word packing from the data structure generation.
hanna
2009-09-02 22:18:17 +0000
3dfc77dc89Add an indel rod which represents the initial point of the indel only (useful for alternate reference making)
ebanks
2009-09-02 19:32:29 +0000
58debd7e56A convenience shortcut isReadUnmapped() added: thanks to SAM format specification, 'read unmapped' flag is not always required to be set for an unmapped read; this method checks both the flag and the alignment reference index/start (if those are set to '*' the flag is not required according to the spec!)
asivache
2009-09-02 17:00:39 +0000
0e6feff8f2fixed locus pile-up limiting problem
aaron
2009-09-02 16:56:44 +0000
d8aff9a925Bug fixes. Was ignoring the '$' character in a few places where I shouldn't have been.
hanna
2009-09-02 16:27:31 +0000
55013eff78Re-revert back to point estimation for now. We need to do this right, just not yet. Also, it's safer to let colt do the log factorial calculations for us.
ebanks
2009-09-02 15:33:18 +0000
eb664ae287Added VariantFiltrationWalker to GATK early release.
hanna
2009-09-02 02:17:50 +0000
1ada085970Cruddy implementation of BWT creation, for understanding and testing purposes.
hanna
2009-09-02 02:16:56 +0000
24d809133dOops - comment out the printouts
ebanks
2009-09-02 01:45:56 +0000
91ccb0f8c5Revert to having these filters use integration over binomial probs
ebanks
2009-09-02 01:40:22 +0000
05c164ec69changing the default behavior to allow any sized read pile-up (which may exceed the memory limit); the user can then select their own read limit. The default of 100K was arbitrary.
aaron
2009-09-01 14:46:00 +0000
54c0b6c430Allow this ROD to consist of just the positions
ebanks
2009-09-01 12:43:18 +0000
4a1d79cd7badded a flag, maximum_reads_at_locus, shortName "mrl", which limits the number of reads we add to the locusByHanger. In some bam files misalignment produces pile-ups of 750K or more reads. We now limit this to the default of 100K reads. The user is warned if a locus exceeds this threshold, and no more reads are added.
aaron
2009-09-01 04:21:58 +0000
0addae967aIndelArtifact filter can now handle filtering false SNPs that occur within the span of an indel but after the first position
ebanks
2009-09-01 03:34:39 +0000
85ca68fab6Initial version: creates a packed file from a fasta, suitable for consumption by BWT-SW. Works with E coli fasta, but will not work at this moment with multi-chr fastas. Will be made into a utility routine when BWA comes together.
hanna
2009-08-31 18:39:19 +0000
591f8eedbbAdded setName() and getName() (however, not used anywhere yet). Now can set the name of the fasta record manually to whatever, however it will work only if done early enough. If the fasta record already started printing itself (i.e. the header line is already done), setName() will throw an exception. Could be too entangled, may reverse this back...
asivache
2009-08-31 18:09:55 +0000
c9eb193c7fNow recognizes a special name for a bound rod track: snpmask. If a rod with this name is bound, then ONLY snps from that track will be used (to set alt reference bases to N's), but indels will be ignored. This helps when an alt. ref has to be created for a set of indel calls, and another rod (e.g. dbSNP) is used to put N's in (for sequenom). If dbSNP rod is not marked as "snpmask", the indels reported there will make their way into the alt. reference output and mess it up.
asivache
2009-08-31 18:05:57 +0000
8e3c3324faAdded filter for SNPs cleaned out by the realigner. It uses the realigner output for filtering; in addition, dbsnp indels partially work; IndelGenotyper calls don't yet work.
ebanks
2009-08-31 04:32:32 +0000
463f80c03eRequire each filter or feature to declare whether or not they want mapping quality zero reads in the alignment context
ebanks
2009-08-31 03:37:24 +0000
1a299dd459Require each filter or feature to declare whether or not they want mapping quality zero reads in the alignment context
ebanks
2009-08-31 03:31:37 +0000
e70101febcAdd a VEC filter for clustered SNP calls that takes advantage of the new windowed approach; delete the old standalone walker.
ebanks
2009-08-31 03:14:42 +0000
215e908a11Reworking of the VariantFiltration system to allow for a windowed view of variants and inclusion of more data to the various filters. This now allows us to incorporate both the clustered SNP filter and a SNP-near-indels filter, which otherwise wasn't possible.
ebanks
2009-08-31 02:16:39 +0000
2402dcd4c9Give usage message if no arguments provided.
andrewk
2009-08-31 00:28:43 +0000
813a4e838fRemoving old code
depristo
2009-08-30 19:27:11 +0000
49a7babb2cBetter organization of Genotype likelihood calculations. NewHotness is now just GenotypeLikelihoods. There are 1, 3, and empirical base error models available as subclasses, along with a simple way to make this (see the factory).
depristo
2009-08-30 19:16:30 +0000
522e4a77aeCaching support across multiple technologies
depristo
2009-08-30 18:10:14 +0000
5af4bb628bIntermediate checking before code reorganization. Full blown support for empirical transition probs in SSG for all platforms. Support for defaultPlatform arg in SSG. Renaming classes for final cleanup
depristo
2009-08-30 17:34:43 +0000
bde67428fdBetter formatting of the code
depristo
2009-08-29 21:46:47 +0000
6c604af86cNicer building of scala programs
depristo
2009-08-29 16:41:56 +0000
8331c195fbchanged the full name of maximum_reads to maximum_iterations for consistancy
aaron
2009-08-28 16:03:46 +0000
8e129d76fdSupport for original quality scores OQ flag. pQ flag in TableRecalibation to preserve quality scores below a threshold (defaulting to 5)
depristo
2009-08-28 14:14:21 +0000
f0179109faRemoving min confidence for on/off genotype
depristo
2009-08-28 01:04:13 +0000
dc9d40eb9aNow requires a minimum genotype LOD before applying tests
depristo
2009-08-28 00:19:23 +0000
37a9b84276corresponding test
depristo
2009-08-28 00:17:42 +0000
bf60980653Experitmental support for empirical P(B_true | B_miscall). --useEmpiricalTransitions flag to SSG enables this support. Much better implementation of Genotype likelihoods -- the system should scream along now. Continuing progress towards deleting old model
depristo
2009-08-28 00:17:24 +0000
ab9458d06dsupport for scala walkers
depristo
2009-08-28 00:15:01 +0000
7cf9a54b64change for new char/byte in BaseUtils
depristo
2009-08-27 23:47:56 +0000
a639459112Trival consistency change from char in to char out, not char in to byte out
depristo
2009-08-27 23:37:37 +0000
6012f7602b@ minor fixes to CoverageAndPowerWalker and AnalyzePowerWalker (switching to By Reference traversal, spitting out Syzygy position for sanity check)
chartl
2009-08-27 21:44:18 +0000
bd1e679bc5@ Fixed issues with AnalyzePowerWalker which depended on CoverageAndPowerWalker. The latter was changed but not the former. Now fixed
chartl
2009-08-27 20:23:41 +0000
a17dad5fa9Converts from fastq.gz to unaligned BAM format. Accepts a single fastq (for single-end run) or two fastqs (for paired-end run). Also allows you to set certain BAM metadata (read groups, etc.).
kiran
2009-08-27 20:20:09 +0000
8740124cda@ListUtils - Bugfix in getQScoreOrderStatistic: method would attempt to access an empty list fed into it. Now it checks for null pointers and returns 0.
chartl
2009-08-27 19:31:53 +0000
92ea947c33Added binomProbabilityLog(int k, int n, double p) to MathUtils:
chartl
2009-08-25 21:27:50 +0000
478f426727Fixed a missing method implementation in these two files.
kiran
2009-08-25 21:21:58 +0000
f12ea3a27eAdded ability for all filters to return a probability for a given variant - interpreted as the probability that the given variant should be included in the final set. The joint probability of all the filters is computed to determine whether a variant should stay or go. At the moment, this is only visible in verbose mode (specify -V). Also removed 'learning mode'; now, filters emit important stats no matter what. Various code cleanups.
kiran
2009-08-25 21:17:56 +0000
e5115409faForce columnSpacing to be at least one. We need a general-purpose, working tool for outputting columnar data to a PrintStream; will add JIRA.
hanna
2009-08-25 19:54:54 +0000
811503d67bvcf changes from Richards comments, fixed a test case
aaron
2009-08-25 14:32:16 +0000
ee05ddde16Added command line options to make the barcode analysis script executable by end users.
andrewk
2009-08-24 21:15:09 +0000
ccdb4a0313General-purpose management of output streams.
hanna
2009-08-23 00:56:02 +0000
b316abd20fcatch a malformed column header name more gracefully
aaron
2009-08-21 21:05:28 +0000
0364f8e989added the ability of the VCFReader to take in compressed gzipped files natively, which is really useful for the validator
aaron
2009-08-21 18:40:38 +0000
647a367680Made the size zero interval file checker emit a warnUser if we're not in unsafe mode.
aaron
2009-08-21 14:40:57 +0000
df9133c90bthe doc on File.length states it returns 0L if it doesn't exist, added a check to make sure it exists (and length < 1)
aaron
2009-08-21 05:55:17 +0000
cd711d7697Added detection of interval files with zero length to the GATK, and removed it from the interval merger walker: this was a critical blocking emergency issue for Eric.
aaron
2009-08-21 05:35:49 +0000
0bdecd8651A most stupid bug. In cases when more than one indel variant was present in cleaned bam file, the "consensus" (max. # of occurences) call was computed incorrectly, and most of the times the call itself was not made at all. Fortunately, the locations where we see multiple indels are a minority, and many of them are suspicious anyway (manifestation of alignment problems?). Could change results of POOLED calls though.
asivache
2009-08-20 22:31:44 +0000