6313c465fbwe want the RMS of the reads qualities not the RMS of the RMS of the read qualities.
aaron
2009-08-20 21:56:29 +0000
6c0adc9145resuse fasta file reader
kcibul
2009-08-20 16:01:58 +0000
0386e110cfsome documentation changes, add a couple of simple checks
aaron
2009-08-20 05:20:27 +0000
026e09ec07adding the package description for the VCF validator
aaron
2009-08-20 04:46:27 +0000
10c98c418bWalker to determine the concordance of 2 genotype call sets.
ebanks
2009-08-20 01:32:44 +0000
1d74143ef4A convenience argument - for Mark - so that you don't have to specify all the output file names
ebanks
2009-08-20 00:49:12 +0000
5725de56dcfixes in VCF, some changes to get it ready to move out of the GATK
aaron
2009-08-19 23:31:03 +0000
0b927f44facreated a better seperation between instantiation of an VCF object and the object itself
aaron
2009-08-19 20:32:50 +0000
ed8c92a12amake isReference do the right thing
ebanks
2009-08-19 20:32:29 +0000
21091b9839Fix for invalid format error when outputting BAM files.
hanna
2009-08-19 19:42:39 +0000
4cf9110468Adding a lot of changes to the VCF code, plus a new basic validator. Also removing an extra copy of the Artificial SAM generator that got checked in at some point.
aaron
2009-08-19 05:08:28 +0000
b3fe566c0cFix descriptions of walker args
ebanks
2009-08-18 19:46:48 +0000
26a6f816c9set default value for output format
ebanks
2009-08-18 16:17:09 +0000
53153fcd79Allow RODs to specify that incomplete records are okay (i.e. that they allow optional fields)
ebanks
2009-08-18 15:26:10 +0000
9b1d7921e8added filter based on concordance to another call set
ebanks
2009-08-18 15:16:30 +0000
b2a18a9d61- first pass at a basic indel filter (for now, based on size and homopolymer runs) - fix simple indel rod printout
ebanks
2009-08-18 03:04:12 +0000
78439f7305Modify Sequenom input format based on official documentation
ebanks
2009-08-18 01:42:57 +0000
63d90702d6another iteration of the VCFReader and VCFRecord, introducing the VCFWriter
aaron
2009-08-17 22:17:34 +0000
1e8b97b560quietly skip empty intervals files rather than crash.
jmaguire
2009-08-17 20:19:14 +0000
92c63fb530It's just "lod" not discovery_lod now.
jmaguire
2009-08-17 18:44:09 +0000
df5744bcd3update this walker so any variants can be passed in
ebanks
2009-08-17 16:30:39 +0000
8403618846the start to the VCF implementation
aaron
2009-08-17 04:34:15 +0000
d4808433a1Added option to output the locations of indels in the alternate reference
ebanks
2009-08-16 03:46:36 +0000
4b6ddc55bdMerge our 2 fastq writers into 1: incorporate Kiran's secondary-base file writer into the fasta/fastq writers
ebanks
2009-08-14 20:55:23 +0000
843d7e6c8fNow you can specify '-' instead of input file name, and the script will read from stdin
asivache
2009-08-14 20:30:56 +0000
0ec581080cRefactoring the code; also, now it prints continuously instead of potentially storing one long string.
ebanks
2009-08-13 01:32:46 +0000
2a01e71277A very simple standalone filter for fooling around with the data: can extract only mapped or only unmapped reads, only reads with mapping quals > X, reads with average base qual > Y, reads with min base qual > Z, reads with edit distance from the ref > MIN and/or < MAX
asivache
2009-08-12 20:28:51 +0000
ebec0ec171A standalone companion to BamToFastqWalker: does the same thing but without calling in gatk's heavy artillery (does not "require" a reference either). Extracts seqs and quals and places them into fastq; along the way it also reverse complements reads that align to the negative strand (so that fastq contains reads as they come from the machine).
asivache
2009-08-12 20:24:37 +0000
112a283f54be nice, don't forget to close the reader when done
asivache
2009-08-12 20:19:56 +0000
ba2a3d8a58Reverse qualities when read seq. is reverse complemented
asivache
2009-08-12 20:17:35 +0000
143f8eea4eoption to output in sequenom input format
ebanks
2009-08-12 16:50:37 +0000
7f1159b6a9Added option to mask out SNP sites with "N"s in the new reference. This is useful when producing Sequenom input files for validating indels...
ebanks
2009-08-12 15:17:45 +0000
43f63b7530Added a walker to convert a bam file to fastq format (including the option to re-reverse the negative strand reads). Picard has such a tool but it is geared towards their pipeline and requires intimate knowledge of the lanes/flowcells,etc. This is just easy.
ebanks
2009-08-12 15:10:40 +0000
d101c20b30added the ability to pass in a csv file of ROD triplets (one triplet per line) to the -B option
aaron
2009-08-11 22:10:20 +0000
e4acd14675Now GenomicMap maps (and RemapAlignment outputs) regions between intervals on the master reference as 'N' cigar elements, not 'D'. 'D' is now used only for bona fide deletions.
asivache
2009-08-11 21:10:17 +0000
2c3f56cb8dfix length calculation (it was including +/- char when it shouldn't)
ebanks
2009-08-11 20:28:24 +0000
5fab934f4e- moved the reference maker to its own directory - added first version of a more complicated reference maker which takes in RODs and creates an alternative reference based on the variants (indels and/or SNPs)
ebanks
2009-08-11 18:01:06 +0000
d69ae60b69fixed two tests affected by my previous commit
aaron
2009-08-11 17:57:50 +0000
fc1c76f1d2fixing a bug where reads in overlapping interval based locus traversals could get assigned to only one of two the regions
aaron
2009-08-11 17:50:16 +0000
bb1d31914c2009_02 release is no longer with us. Update the bam list.
hanna
2009-08-11 12:49:23 +0000
1851613de4Now using larger database of HLA alleles
sjia
2009-08-11 03:11:14 +0000
0e7c158949I've pulled out the functionality of the analyzer into a single python file which doesn't require all of the irrelevant config parameters (which would cause problems for external users). I'll release this and the simple config file to 1KG for use in analyzing recalibration efforts.
ebanks
2009-08-11 02:56:43 +0000
dd228880edPartially implemented NewHotnessGenotypeLikelihoodsTest caused the tests to fail. Ouch! So hot it burned me.
hanna
2009-08-10 20:45:44 +0000
3208eaabccA standalone picard-level tool for breaking individual reads into "pairs" of first/last N bases. Supports: * splitting off only start or end of the read, or both; the output will contain chopped sequences AND corresponding base qualities * splitting arbitrary number of bases off each end (different numbers for left and right segments can be specified; segments can overlap) * splitting only unmapped reads, ignoring mapped ones * writing splitted ends into separate sam/bam files, or into a single output file * decorating original read names with user-specified suffixes for each end (e.g. _1 and _2 for left and right parts of the read); default: no decoration, original read names are used * when mapped reads are split, the alignment cigars are chopped appropriately and the alignment start positions are adjusted (for the right end) to correctly specify the alignment of the selected part of the read
asivache
2009-08-10 20:42:49 +0000
ecae619a1bwarn user when dbSNP rod looks suspicious
ebanks
2009-08-10 20:20:20 +0000
2841e151d0javadoc comments only
asivache
2009-08-10 18:44:35 +0000
921d4f4e95RemapAlignments is a standalone picard-level tool that does not use gatk engine; moved to 'tools'
asivache
2009-08-10 15:41:07 +0000
b7768830c5Tiny reorganization in the playground: a place for 'picard-level' standalone tools that are not based on gatk
asivache
2009-08-10 15:07:35 +0000
02f1af0743Don't die when a readgroup is absent from the covariates table - it could happen when all reads are unmapped (or have MQ0); instead, just don't alter the quals.
ebanks
2009-08-10 03:10:33 +0000
089dab00e2Was discordance rate, now concordance rate
depristo
2009-08-07 19:37:52 +0000
6d3ef73868Now includes statistics on the allele agreement with dbSNP -- counts concordant calls as dbSNP = A/C and we say A/C, vs. we say A/T
depristo
2009-08-07 19:37:07 +0000
20baa80751Updated polarized reference priors, need DiploidGenotypePriors class that is directly used by the NewHotness genotypelikelihoods, more bug fixes and refactoring, etc.
depristo
2009-08-07 19:01:04 +0000
a864c2f025Updated polarized reference priors, need DiploidGenotypePriors class that is directly used by the NewHotness genotypelikelihoods, more bug fixes and refactoring, etc.
depristo
2009-08-07 19:00:06 +0000
db250f8d3eDon't print if not in learning mode
ebanks
2009-08-07 06:08:02 +0000
4c1fa52ddf-Added mapping quality zero filter -Set some reasonable defaults (based on pilot2)
ebanks
2009-08-07 03:18:02 +0000
bbd7bec5dbContinuing cleanup of SSG. GenotypeLikelihoods now have extensive testing routines. DiploidGenotype supports het, homref, etc calculations. SSG has been cleaned up to remove old garbage functionality. Also now supports output to standard output by simply omitting varout
depristo
2009-08-05 22:25:30 +0000
d60d5aa516Fixed bug: previously reset likelihoods after each region/exon. Better comments/documentation added
sjia
2009-08-05 18:44:46 +0000
0d47798721made booster distance a parameter
kcibul
2009-08-05 18:29:21 +0000
3b74b3ba74print out ref/alt ratio, not major/minor
ebanks
2009-08-05 16:36:25 +0000
48713e154cWindowed access to the reference.
hanna
2009-08-05 16:29:15 +0000
65e9dcf5b7Fully operational version of the new genotype likelihoods class. (1) Much cleaner interface. Now explicitly stores likelihoods, priors, and posteriors in separate arrays indexed by an enum, (2) no longer can be used to make calls, it relies on SSGGenotypeCall to order the likelihoods, calculate best to ref, etc, this is just for calculating genotype likelihoods now; (3) Now performs extensive error checking with validate() to ensure the system is behaving properly. (4) fixed incorrect treatment of N bases, which we being counted against everyone (5) likely found a stats bug in which heterozyosity was being applied incorrectly to the genotype priors
depristo
2009-08-05 01:00:55 +0000
4dc23f2763Trivial formatting changes as I moved more legacy code into this system
depristo
2009-08-05 00:54:26 +0000
34af669dbbExplicit ENUM representation of the diploid genotypes. Please use this from now on to represent strings like AA or AT
depristo
2009-08-05 00:53:43 +0000
5487ab0ee6Added several useful routines to MathUtils for summing and bounds checking of doubles
depristo
2009-08-05 00:41:31 +0000
21d1eba502Cleaned division of responsibilities between arguments to map function. Reference has been changed from an array of bases to an object (ReferenceContext), and LocusContext has been renamed to reflect the fact that it contains contextual information only about the alignments, not the locus in general.
hanna
2009-08-04 21:01:37 +0000
939b19e715Committing the first version of the homopolymer filter. Removes SNPs that occur at the edges of homopolymer runs and whose nonref allele matches the repeated base in the homopolymer.
mmelgar
2009-08-04 14:35:51 +0000
20ff603339New hotness and old and Busted genotype likelihood objects are now in the code base as I work towards a bug-free SSG along with a cleaner interface to the genotype likelihood object
depristo
2009-08-03 23:07:53 +0000
4986b2abd6Fixing bug in SSG -- genotyping and discovery were mixed up by name
depristo
2009-08-03 22:13:35 +0000
3485397483Reorganization of the genotyping system
depristo
2009-08-03 20:55:31 +0000
9f1d3aed26-Output single filtration stats file with input from all filters -move out isHet test to GenotypeUtils so all can use it
ebanks
2009-08-03 20:44:21 +0000
d88ea91939Slight reorganization of genotype interface
depristo
2009-08-03 19:19:11 +0000
880a01cb5dSlight reorganization of genotype interface
depristo
2009-08-03 19:18:41 +0000
d840a47b11Slight reorganization of genotype interface
depristo
2009-08-03 19:17:15 +0000
20986a03decleanup before moving files
depristo
2009-08-03 19:08:24 +0000
e3b08f245fPull out RMS calculation into MathUtils for all to use
ebanks
2009-08-03 17:00:20 +0000
e495b836d3- added mapping quality filter - make the filters brainless in that they strictly have thresholds and filter based on them; require user to calculate and input these thresholds. - update filters in preparation for migration to new output format
ebanks
2009-08-03 16:46:51 +0000
ba07f057acfinish the math for RMS
ebanks
2009-08-03 16:18:09 +0000
8bc925a216Commit on the behalf of Mark: cleaning up some old and busted code in GenotypeLikelihood and associated objects.
kiran
2009-07-31 21:18:30 +0000
8d06bb21edA little gadget to select random samples from input stream(s) of unknown length. By default, selects a single line (with probability 1/TOTAL_NUMBER_OF_LINES_READ), with -N option randomly selects specified number of lines. Can read from STDIN or from arbitrary number of input streams (all streams will be merged). Examples:\n cat file1 file2 file3 | randomSampleFromStream.pl -N 5 \n\n or \n randomSampleFromStream.pl file1 file2 file3
asivache
2009-07-31 18:55:14 +0000
9dfee7a75cthe "-genotype" option now acts correctly as a discovery mode caller in SSG
aaron
2009-07-31 18:31:45 +0000
c2c80dd946cleanup and moving some things around to more logical locations
aaron
2009-07-31 16:28:39 +0000
9a0761cd8faccidentally committed some debug code
aaron
2009-07-31 15:25:22 +0000
2f2c8576a5GLF output is now well validated, and some changes for new Genotypes interface code
aaron
2009-07-31 15:21:28 +0000
afccbc44ecScript that performs all the processing steps from raw Illumina reads through to analysis of barcoding and hybrid selection efficience as documented in the GATK tutorial; can automatically run all steps in series on the farm.
andrewk
2009-07-31 00:22:53 +0000
eb4b9a743aScript that runs most of the steps involved in validating the CoverageEval system that predicts performance for given depth of sequencing coverage across a genome.
andrewk
2009-07-31 00:18:45 +0000
8eeb87af2aTests for downsampling related utilities in ListUtils class that didn't get checked in earlier
andrewk
2009-07-31 00:09:35 +0000
efd0fd1f0aShort python script that takes paired-end BAMs and aligns them with BWA. Referenced in GSA wiki tutorial
andrewk
2009-07-31 00:04:10 +0000
678c2533caRemoved custom output stream for file and replaced with the standard out PrintStream
andrewk
2009-07-30 22:36:42 +0000
2a7dfce9aefix the header string mismatch that Andrew found
aaron
2009-07-30 22:26:34 +0000
44673b2dceRemoved a debugging println that was accidentally checked in
andrewk
2009-07-30 22:23:27 +0000
845488ff94VariantEval now decides whether a variant is not confidently called using BestVsNetxBest if genotypes are being evaluated and BestVsRef if not (variant discovery only). Also, the absolute value of the BestVsRef LOD (getVariantionConfidence) is used so that confident reference calls (if the GELI has output them) will show up in the final table as reference calls rather than no calls.
andrewk
2009-07-30 21:54:06 +0000
1c648a2d5fSkip compiled python files (*.pyc) in svn status output
andrewk
2009-07-30 21:45:23 +0000