Commit Graph

7430 Commits (5e832254a4e024378f7fdee252abf7df9e289c6a)

Author SHA1 Message Date
depristo 4888df97c7 Added averageDouble function. How can we write a generic average function?!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@136 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-22 19:41:30 +00:00
jmaguire cf407168cf keep track of the position you're called on.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@135 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-22 16:47:49 +00:00
jmaguire 096f0dbc68 don't run off the end of the list of loci.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@134 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-22 16:47:29 +00:00
jmaguire 4e0cd6ab84 Now works on single samples and computes metrics.
Here is an example metrics output from a very tiny region:

	Allele Frequency Metrics (LOD >= 5)
	-------------------------------------------------
	Total loci                         : 14704
	Total called with confidence       : 10920 (74.27%)
	Number of Variants                 : 16 (0.15%) (1/682)
    Fraction of variant sites in dbSNP : 100.00%

Missing:
    Microarray(hapmap) concordance, tp/fp.

Optional:
    Histograms of depth of coverage, LOD, observed allele frequency, etc.



Still to implement:
    Propagate command line argument N (number of chromosomes) into walker to enable pooled calling.
    Take allele frequency priors as input.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@133 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-22 15:45:12 +00:00
jmaguire f7ad17016d some reformatting and logic cleanup in the comparison functions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@132 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-22 15:36:56 +00:00
jmaguire dfe50ce773 optionally check that the records are sorted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@131 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-22 15:36:24 +00:00
jmaguire 149ac3d96c Now iterate over a large set of tiny intervals efficiently.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@130 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-22 12:04:11 +00:00
asivache df2a7039cb Henious bug fixed: only rookies forget that external conditions need to be re-checked after loop ends on some other condition, duh! In addition, msa piles are now seeded with a single read sequence each (if there are less then 4 reads it might be hard to seed with two pairs)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@129 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-21 18:32:18 +00:00
kiran 411e5cf647 Added FourBaseCaller as a jar build target.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@128 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-21 17:59:13 +00:00
kiran 6e1fa7d61a Java version of basecaller that estimates probability distribution over four-base hypothesis space via an internal-control-initialized Gaussian mixture model over base channel intensities.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@127 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-21 17:58:50 +00:00
kiran 3e350006e0 Added a directory to house some Illumina output parsers. Hopefully this will be merged back into Picard at some point.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@126 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-21 17:55:56 +00:00
asivache 497eea2e5c minor changes and shuffling code around; also, now when realigned piles are printed they are sorted by start position
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@125 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-21 17:43:49 +00:00
jmaguire 0ea44a5805 1st draft of support for an file containing a list of intervals.
Appears to work, but inefficient:
At each reference location, the entire list of intervals is linear searched. 

Instead we need to have the intervals sorted, and simply seek forward from interval to interval.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@124 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-21 16:07:32 +00:00
hanna 1fcf4c0cbf Update picard to work with new samtools.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@123 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 21:51:26 +00:00
jmaguire 5dca560c3c A bunch of refactoring, and more on the way.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@122 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 21:31:07 +00:00
hanna b806a9cf68 Updated for new version of samtools, which returns a sequence dictionary
rather than a simple list of sequences.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@121 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 20:38:24 +00:00
hanna 6e2d939905 Added subversion rev 180 of the sam library.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@120 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 20:17:51 +00:00
ebanks c5433a3120 dumps out base qualities per position for use in making boxplots
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@119 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 17:01:18 +00:00
jmaguire 1161c261ac made all data members public.
switched logOddsVarRef to LOD.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@118 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 16:44:17 +00:00
depristo 9b5e5e06f9 Now supports checking that the input files exist and are good
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@117 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 16:40:54 +00:00
ebanks f3f1b47808 deal with reverse complemented reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@115 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 16:01:49 +00:00
asivache 9ec96414c7 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@114 348d0f76-0448-11de-a6fe-93d51630548a 2009-03-20 15:54:29 +00:00
depristo 322f4b944f Better stress test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@113 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 15:52:54 +00:00
asivache 3565b50ff5 main class (argument processing and traversing the reference) and implementation of all the Receiver functionality for building read piles over indels
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@112 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 05:18:04 +00:00
asivache 4c3b92b860 comparator for interval objects
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@111 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 05:15:13 +00:00
asivache f810412d75 equals(), hashCode() updated/added, also a few minor changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@110 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 05:13:07 +00:00
asivache 4badd54216 Indel also implements Interval interface but has its quirks
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@109 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 05:11:17 +00:00
asivache 501e92d441 an interface for an interval object and simple minimum implementation; note: in contrast to arachne, this is closed interval
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@108 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 05:09:56 +00:00
asivache 29d2d460f3 a trivial interface and even more trivial implementations that do nothing (ignore the data they receive)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@107 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 05:08:15 +00:00
depristo b83c8319c7 Crushed subtle and potentially insidous bug in seeking within the fasta; a beer for anyone who can tell me the situation where this might arise...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@106 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 00:07:06 +00:00
depristo 34ee48fd82 Fixing output printing issues in the code, as well as adding more safety checks
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@105 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-19 23:02:49 +00:00
hanna 6fdd622160 Describe how GATK finds walkers. Change the example to avoid copying the class file into the walkers directory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@104 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-19 22:41:12 +00:00
hanna 104e2811ec Configure the plugin directory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@103 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-19 22:12:25 +00:00
andrewk 6bcdac5c62 Restructured AlleleFrequency classes into 3 classes: AlleleFrequencyWalker, AlleleFrequencyMetricsWalker, AlleleFrequencyEstimate. AlleleFrequencyMetricsWalker class now calls mapper function of AlleleFrequencyWalker and works with the result. AlleleFrequencyEstimate is now a separate class instead of a subclass of AlleleFrequencyWalker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@102 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-19 22:06:01 +00:00
hanna 41fec1565c Hello, world! for GATK.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@101 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-19 21:46:22 +00:00
aaron 7bc45b68aa Added dependences on two libraries: the Colt package, which is a collection of high performance computing libraries from CERN; and Log4j, which will be our new logging platform.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@100 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-19 16:16:31 +00:00
andrewk 5fa99f430e One line format is useable and two levels of debug output are available (debug = 1: one line format, debug = 2: table of sampled probs for each locus). Class AlleleFrequencyMetrics computes %dbSNP and frequency of SNPs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@99 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-19 15:05:05 +00:00
depristo f1034f3dfd Stress Test utility for pushing the GATK to its limits. Takes a list of sam files and runs Analyses on them all, optionally in the queue
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@98 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-19 03:15:00 +00:00
hanna 4242dba295 Remove endless iterator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@97 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 23:53:40 +00:00
hanna 225ea64bd9 Moved extra walkers at Mark's request.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@96 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 23:52:08 +00:00
hanna ffb6f8f5da Move the basic gatk framework into the core subtree.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@95 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 23:39:00 +00:00
asivache 69316f1873 removed unused import statement
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@94 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 21:56:15 +00:00
asivache 875272e5c5 moved counted object to utils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@93 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 21:54:04 +00:00
asivache e09af2ef70 changed variable declaration from concrete class to interface
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@92 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 21:50:47 +00:00
asivache 708ada3e99 an accessory for CountedObject: builds a comparator for CountedObject<T> given a comparator for T; compares the underlying objects T themselves, *not* the associated counters
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@91 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 21:45:54 +00:00
asivache 37101045af a simple wrapper class; less overhead than keeping a separate Integer counter object and going through object reallocation and/or autoboxing on each counter increment
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@90 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 21:44:30 +00:00
ebanks 45d2a9acd8 Added walker to print out a histogram of where mismatches occur in alignments
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@89 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 19:46:42 +00:00
hanna 1096bbd4d9 Moved build.xml, ivy.xml and settings to root of Sting repository.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@88 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 19:13:19 +00:00
hanna d46ee96269 Added support for loose Walker class files in walkers directory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@87 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 17:32:24 +00:00
ebanks fe9e52c47e allow on fly sorting AND validation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@86 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 15:50:17 +00:00