Commit Graph

1096 Commits (ffeb3fd80dfccaf00a96d2009f326829c1ce1fdd)

Author SHA1 Message Date
kiran f838a5e511 Changed some double comparisons of the form a == b to abs(a - b) <= precision. Now we shouldn't be passing or failing some if conditions due to floating-point precision.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@388 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 20:05:46 +00:00
asivache d44c30154a added MAX_READ_LENGTH - now we can ignore long reads (454?); a bad idea in general, but the performance hit is to hard to take, at least for preliminary testing runs...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@384 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 16:53:12 +00:00
jmaguire 6652f13a17 more verbose gff output!
EVEN MORE verbosity to come! 

Tremble in anticipation.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@382 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 15:21:23 +00:00
jmaguire 6e180ed44e Unified caller is go.
AlleleFrequencyWalker and related classes work equally well for 2 or 200 chromosomes. 

Single Sample Calling:

	Allele Frequency Metrics (LOD >= 5)
	-------------------------------------------------
	Total loci                            : 171575
	Total called with confidence          : 168615 (98.27%)
	Number of variants                    : 111 (0.07%) (1/1519)
	Fraction of variant sites in dbSNP    : 87.39%
	-------------------------------------------------
	
    Hapmap metrics are coming up all zero. Will fix.

Pooled Calling:

	AAF r-squared after EM is 0.99. 
    AAF r-squared after EM for alleles < 20% (in pools of ~100-200 chromosomes) is 0.95 (0.75 before EM)

    Still not using fractional genotype counts in EM. That should improve r-squared for low frequency alleles.


Chores still outstanding:
    - make a real pooled caller walker (as opposed to my experiment framework).
    - add fractional genotype counts to EM cycle.
    - add pool metrics to the metrics class? *shrug* we don't really have truth outside of a contrived experiment...



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@380 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 12:29:51 +00:00
asivache b4136b6d6e a few tweaks to make it more robust: ignore reads with cigars containing anything but I,D,M; don't set up contig ordering manually, rely upon reference sequence and its dictionary; don't die if a record does not have NM tag, but faal back to direct counting instead; now requires reference as a cmdline arg
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@378 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 04:49:19 +00:00
kiran c51f51f255 Make sure we always write at least 1000 points per base in each cycle's scatterplot. Print the disagreement rate between Bustard and FourBaseRecaller.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@375 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:49:41 +00:00
kiran 35fc002d5d Debugging information is now written in such a way to make it easier to import into R.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@372 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:45:33 +00:00
kiran 6ee4fe5a20 Fixed a Bustard/Firecrest file synchronization bug.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@371 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:44:07 +00:00
kiran 817278be46 If a SAMRecord is on the negative strand, reverse complement the SQ tag.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@370 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:42:24 +00:00
kiran 1d5a22cacf Extracts a Fastq file and the SQ tags to a separate file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@369 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:41:44 +00:00
kiran e410c005c0 A debugging tool to ensure the SQ tag in a four-prob SAM file matches the SAMRecord strand orientation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@368 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:40:42 +00:00
kcibul ce72932a45 * refactored GenomeLoc to use contigIndex internally for performance and fixed several calling classes
* added basic unit test for GenomeLoc
* fixed bug when parsing genome locations like chr1:5000 the start position was being left as maxint rather than being set to the same as the stop position.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@365 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 02:25:17 +00:00
kiran 2b59110dca CombineSamAndFourProbs is better.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@358 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 04:19:53 +00:00
kiran 56aa98ad30 Ignore null values.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@357 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 04:18:20 +00:00
kiran 2ef2c9e121 Fixed an issue wherein the SQ field was only being pulled from the first read of the pileup, no matter what. Fixed an issue wherein Andrew enumerates his bases as A:0, C:1, T:2, G:3, and Kiran's QualityUtils methods enumerate bases as A:0, C:1, G:2, T:3 (we should standardize this). Fixed an issue wherein the remaining probability was being divided by 3 rather than 2 when four-base probs are enabled.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@356 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 04:17:53 +00:00
depristo 17b3d5b554 New ROD accessing system, including a generalized interface for binding ROD on the command line that doesn't require you to chance GenomeAnalysisTK.java
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@355 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 22:04:59 +00:00
kiran f5cc2d8b0b Commented out import of IlluminaParser.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@354 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 21:30:29 +00:00
kiran c5220c0822 Four-base probs are now decoded with the relevant method in QualityUtils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@351 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:52:17 +00:00
kiran 9bc763a835 A better (aka 'working') tool for combining four-base probs with an aligned sam file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@350 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:51:37 +00:00
kiran b7a2e82b46 Can optionally process raw or corrected intensities.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@349 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:50:11 +00:00
kiran 6cdad10dd1 Make output type identical to the bustard parser so the values can be easily swapped for one another.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@348 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:49:34 +00:00
kiran d0ce56e018 Remember to take the strand flag into account when calculating error rate per cycle as a surrogate for instrument performance.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@347 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:48:45 +00:00
kcibul c556a97f17 Skeleton of Somatic Coverage tool
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@342 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 02:34:03 +00:00
kiran 089bf30cf4 Send things to the out file via the logger.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@339 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 21:49:03 +00:00
kiran 6db9a00a0b SAMFileWriter doesn't appear to flush the buffer when its destructor is called. You have to call the close() method. Also, choose a random base for Ns in the forward and reverse strands so that samtools doesn't pitch a fit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@338 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 21:48:24 +00:00
kiran eb2f0ebd62 If the first base of a read is 'N', and the alignment cigar says every base matches, samtools calls shennanigans. Now I just output an A, but the real way to do this is to modify the cigar string accordingly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@337 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 19:58:18 +00:00
kiran 0e7d962eca Oops. Slight twiddle of the math here so that I'm not asking if bestBase == nextBestBase.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@336 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 19:56:54 +00:00
kiran 62ac7366ed A quick hack to ensure that the sequence, qualities, and secondary qualities are in accordance with the strand flag.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@331 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 15:57:28 +00:00
kiran 25474ebe7e Computes the read error rate for a bam file. Ignores reads with indels, treats low-quality and high-quality reference bases the same. Does not count ambiguous reference bases as mismatches. Optionally allows for best two bases in read to be used.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@330 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 15:56:10 +00:00
asivache 8d48bdc9ec it walks... the version committed actually counts snps only
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@328 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 02:00:41 +00:00
asivache 62d75ced3c nothing fancy, just a wrapper (aka struct) to pass around a bunch of counts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@327 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 01:58:57 +00:00
hanna 202c501939 Added a sample xml marshaller / unmarshaller.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@322 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 22:28:16 +00:00
kiran 99579a1ef8 Math correction.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@310 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 02:18:13 +00:00
kiran 9be978e006 Intermediate commit (debugging info).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@309 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 01:20:15 +00:00
kiran 5a5c6d1276 Added some debugging stuff (writes model parameters to one file per cycle).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@304 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 22:00:58 +00:00
ebanks 3f75fc4e83 Unfortunately, because BWA occasionally outputs crazy reads, we need
to make sure not to have an ArrayIndexOutOfBoundsException thrown.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@297 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 03:51:35 +00:00
kiran f12d40dde8 Simplified SAMRecord construction and emission.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@296 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-05 04:48:31 +00:00
depristo 4eac3193f7 Added RefMetaDataTracker system as a replacement for the List<RefenenceOrderedData> going into walkers. This system allows you to more easily get a tracker for processing using the lookup(name, default) system. See Pileup for an example.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@292 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 19:54:54 +00:00
kiran ef06924f73 JavaDocs!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@290 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 19:19:17 +00:00
andrewk bef475778f - Updated --hapmap switch to --hapmap-chip to reflect the data being chip data for an individual rather than population allele frequency data in Hapmap
- Corrected some bugs to get metrics logging working
- Added a switch --force_1base_probs to ignore 4-base probalities if they exist


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@287 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 17:32:31 +00:00
depristo edc44807af rod's now have names. Use getName() to access it. Next step is better interface to accessing rods
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@286 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 16:41:33 +00:00
kiran 5019971290 Now outputs four-base SAM record (read name prefixed with KIR) and bustard SAM record (prefixed with BUS) for easy debugging.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@285 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 15:48:51 +00:00
kiran 15151ac125 Corrected the use of the prior.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@284 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 15:47:47 +00:00
kcibul 9bbce32064 Basic dbSNP and HapMap frequency aware SNP caller... still in progress
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@282 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 14:24:09 +00:00
depristo f031d882c6 ByReference traversals!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@281 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 13:23:18 +00:00
andrewk e3ac0cb500 - A lot of code cleaned up; separated metrics code from AlleleFrequencyMetricsWalker into AlleleMetrics and eliminated the former class. AFMW (aside from being a name so long that it warrants an acronym) can now be implemented by passing an option to AlleleFreqeuncyWalker that logs metrics to a file.
- AlleleMetrics and AlleleMetricrsWalker are now ready to take a list of clasess that implement the AllelicVariant interface
- Switched a genome location in AlleleFrequencyEstimate from String to GenomeLoc which makes way more sense.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@280 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 02:09:10 +00:00
kiran 7d889c0661 Refactored into oblivon.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@276 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:12:15 +00:00
kiran dffc879240 Should now be appropriately using Bustard data to call bases (there are some mathematical subtleties that arise when no longer using ICs as initialization data. Also writes some more relevant fields in the SAM records. WAAAAAY simpler than old version. Like, super way.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@275 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:10:13 +00:00
kiran 59334b0270 A convenience class for manipulation base probability distributions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@274 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:08:31 +00:00
kiran 399d9b8c1e A class that represents the model parameters for all of the Gaussian models for all cycles.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@273 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:08:10 +00:00
kiran f0f94b6c72 A class that represents the model parameters for all of the Gaussian models at a given cycle. Handles the accumulation of parameter initialization data and provides for efficient computation of base probability distribution.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@272 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:07:47 +00:00
jmaguire 8ce4dabd7c Print coverage per reference base for each sample in a merged BAM file.
This  is a good example for how to untangle a merged BAM file.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@269 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 21:35:31 +00:00
asivache 5d9b068b8b generic declarations added here and there to eliminate a few annoying warnings; no consequential changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@268 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 20:53:01 +00:00
kcibul c192a95998 changes in three files to make the HapMap RODs work:
- HapMapAlleleFrequenciesROD.java - the referenceOrderedDatum implementation
 - PrepareROD.java - has a static block that loads the known ROD classes, had to add the above
 - GenomeAnalysisTK.java - when supplied a hapmap argument... loads the ROD

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@265 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 19:55:19 +00:00
jmaguire d202264b23 initial add of pooled calling experiment walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@262 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 17:55:40 +00:00
depristo 24e8581c30 Slight improvements to allele caller interface; fixed problem with printing progress
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@260 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 16:44:12 +00:00
asivache 20d4bcbb2e I said - delete!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@259 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 16:21:21 +00:00
jmaguire 25ace306b9 GenomeAnalysisTK: better documentation of validation option.
AlleleFrequencyWalker: output the last reference interval if it's left hanging open.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@258 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 16:11:20 +00:00
asivache f26055c926 interface representing allele variants/genotype calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@256 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 15:57:19 +00:00
jmaguire f42b75da72 restore GFF_OUTPUT_FILE to a required argument.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@255 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 14:34:08 +00:00
depristo 2cd9a1597f Simple improvements to allele caller
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@254 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 14:09:14 +00:00
jmaguire 4faacac315 Now handle the case where we don't actually SEE all of the positions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@248 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 19:50:07 +00:00
jmaguire 675505646d now makes confident reference intervals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@247 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 18:46:14 +00:00
jmaguire ede52f7359 - take command line arguments
- output GFF lines to a file (specified by a command line argument)
- improve the GFF output string


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@240 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-31 18:43:00 +00:00
ebanks 907c183242 update walkers so that onTraversalDone works (it now takes an arg)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@235 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-31 15:05:33 +00:00
ebanks 3896cc8f17 Moved avg depth of coverage functionality into the core depth of coverage
walker.  Used new command line args for walkers.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@234 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-31 05:02:33 +00:00
ebanks 007ecc8616 Added a stateless walker to give the average depth of coverage for given reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@233 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-31 02:33:59 +00:00
jmaguire 875802e8fc print output as a GFF line.
still need to add printing GFF intervals for stretches of confident reference calls.

does the GFF ROD class handle intervals?? We'll find out. >:)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@225 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-30 17:47:35 +00:00
jmaguire b752960586 rearranged some stuff and eliminated the binomial prior in the N!=2 case. Much faster.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@224 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-30 17:26:05 +00:00
depristo d7c0bcc223 Reorganized GenomeLoc code to more clearly and better use the picard SequenceDictionary information.
All GenomeLoc[] are not ArrayList<GenomeLoc> for clarity and consistency
Parsing now recursively merges contiguous elements chr1:1-10;chr1:11-20 => chr1:1-20
Added support for TraversingByLoci over all reference positions specified by the provided location array.  System dynamically determines which traversal system to use.
Pileup now marks, very clearly, reference positions without covered reads.
Made changes around the codebase to deal with new GenomeLoc structure.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@218 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-28 20:37:27 +00:00
hanna 4a6be896b9 Provide out and err PrintStreams to the walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@213 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-27 15:03:32 +00:00
asivache c6d9848d08 synchronizing latest changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@212 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-27 14:15:44 +00:00
hanna 53fe9acf65 Make command-line arguments available in walker constructor, provide back door from
walker into GATK itself, do some cleanup of output messages, and add some bug fixes.
Command-line arguments in walkers are now feature-complete, but still a bit messy.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@203 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 20:45:27 +00:00
hanna 5f9010116a Collapse the walker hierarchy, in preparation for in-walker output streams less hokey walker args.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@201 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 16:22:35 +00:00
depristo 7cad3acc61 Support for dynamically merging data files. Preliminary only -- everything in these systems is still being tested
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@200 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 14:40:50 +00:00
asivache f47a214f96 massive changes everywhere; lots of bugs fixed; methods moved around; computation and printout of overall stats added; now decides whether to accept or reject 'improvement'; writes alignments into two output sam files (unmodified reads/failed piles into one, realigned piles into the other); special treat for paranoids: writes third sam file with all the analyzed reads, unmodified
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@197 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 02:26:17 +00:00
andrewk 0331cd8e95 Updated AlleleFrequency* classes to calculate separate lods for VarVsRef and BestVsNextBest mixture (qstar) theories; AFWMetrics now reports single sample performance w.r.t. Hapmap chip using the appropriate lod for gentoyping (BestVsNextBest) or variant / reference calling (VarVsRef).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@196 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 02:10:18 +00:00
andrewk c88a17dfee AlleleFrequencyWalker now can parse 4-base probs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@195 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 20:33:05 +00:00
jmaguire 2ed63fe17c a bunch of changes that support pools.
they don't appear to break single sample:

	Allele Frequency Metrics (LOD >= 5)
	-------------------------------------------------
	Total loci                            : 9000
	Total called with confidence          : 8138 (90.42%)
	Number of variants                    : 11 (0.14%) (1/739)
    Fraction of variant sites in dbSNP    : 81.82%



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@192 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 18:52:42 +00:00
kiran 607731da91 Fixed a harmless (but annoying) bug wherein the read name for the SAMRecords increases by two on every iteration rather than one.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@189 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 15:20:29 +00:00
jmaguire 44acc358b7 Add a "notes" member to the AlleleFreqencyEstimate, e.g. for hapmap metadata.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@188 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 15:18:10 +00:00
asivache 4c29dca70d git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@186 348d0f76-0448-11de-a6fe-93d51630548a 2009-03-25 09:23:42 +00:00
asivache 71d3e8e99b fixed another bug in gapped alignment computation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@185 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 08:33:57 +00:00
asivache 40f45c2333 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@184 348d0f76-0448-11de-a6fe-93d51630548a 2009-03-25 05:48:10 +00:00
andrewk 30babbf5b9 Restructured AlleleFrequencyMetricsWalker to correctly report Hapmap concordance numbers for genotyping and added reporting for Hapmap reference/variant calling. Also, tiny bugfix in interval code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@181 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 01:12:05 +00:00
kiran 28c1330b4b Fixed a bug wherein the loop variable for the second end of the pair was actually looping over the entire raw read (first and second ends combined).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@178 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 21:59:25 +00:00
kiran 499c422de6 A version of the four-base caller that computes the probability distribution over base call space by initializing off the Bustard calls rather than the ICs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@173 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 20:11:39 +00:00
asivache 4222016bf5 stop printing sw matrix and other debug infoant
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@171 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 18:15:52 +00:00
asivache 8ea8a74fbf fixed bug in calculation of alignment start offset for negative offsets; toString() added
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@170 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 18:05:28 +00:00
asivache 9aa1ccd9b7 fixed some bugs in calling the optimal path; parameters adjusted (?)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@169 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 17:27:51 +00:00
kiran 88d94d407a Fixed a bug in the parsing of the second end of the pair.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@168 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 14:34:37 +00:00
asivache 786a7845dd git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@167 348d0f76-0448-11de-a6fe-93d51630548a 2009-03-24 14:06:44 +00:00
asivache 3d1e0bf079 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@166 348d0f76-0448-11de-a6fe-93d51630548a 2009-03-24 14:06:24 +00:00
asivache 908065125f computes Smith-Waterman pairwise alignment
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@164 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 05:36:37 +00:00
andrewk 9dee9ab51c Added Hapmap data track (using rodGFF class for GFF file format) to toolkit as a command line option, Hapmap metrics to AlleleFrequencyMetricsWalker, and a python Geli2GFF file converter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@163 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 03:58:03 +00:00
hanna 63cd1fe201 Push core / playground lower into the tree.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@160 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-23 23:19:54 +00:00