Commit Graph

132 Commits (c8d7207a8e90e4d9fddf4b77c7370d3779aa414f)

Author SHA1 Message Date
depristo c8d7207a8e Fixed problem with GenomeLoc logic -- optimization was causing assertion failure.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@138 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-22 19:53:00 +00:00
depristo 52ad08298a New FastaSequenceFile with support for poor-man's seek and querying the next contig name without loading the whole next contig into memory. Vastly speeds up the performance of jumping to distant parts of the genome with the location operator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@137 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-22 19:43:56 +00:00
depristo 4888df97c7 Added averageDouble function. How can we write a generic average function?!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@136 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-22 19:41:30 +00:00
jmaguire cf407168cf keep track of the position you're called on.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@135 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-22 16:47:49 +00:00
jmaguire 096f0dbc68 don't run off the end of the list of loci.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@134 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-22 16:47:29 +00:00
jmaguire 4e0cd6ab84 Now works on single samples and computes metrics.
Here is an example metrics output from a very tiny region:

	Allele Frequency Metrics (LOD >= 5)
	-------------------------------------------------
	Total loci                         : 14704
	Total called with confidence       : 10920 (74.27%)
	Number of Variants                 : 16 (0.15%) (1/682)
    Fraction of variant sites in dbSNP : 100.00%

Missing:
    Microarray(hapmap) concordance, tp/fp.

Optional:
    Histograms of depth of coverage, LOD, observed allele frequency, etc.



Still to implement:
    Propagate command line argument N (number of chromosomes) into walker to enable pooled calling.
    Take allele frequency priors as input.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@133 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-22 15:45:12 +00:00
jmaguire f7ad17016d some reformatting and logic cleanup in the comparison functions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@132 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-22 15:36:56 +00:00
jmaguire dfe50ce773 optionally check that the records are sorted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@131 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-22 15:36:24 +00:00
jmaguire 149ac3d96c Now iterate over a large set of tiny intervals efficiently.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@130 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-22 12:04:11 +00:00
asivache df2a7039cb Henious bug fixed: only rookies forget that external conditions need to be re-checked after loop ends on some other condition, duh! In addition, msa piles are now seeded with a single read sequence each (if there are less then 4 reads it might be hard to seed with two pairs)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@129 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-21 18:32:18 +00:00
kiran 411e5cf647 Added FourBaseCaller as a jar build target.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@128 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-21 17:59:13 +00:00
kiran 6e1fa7d61a Java version of basecaller that estimates probability distribution over four-base hypothesis space via an internal-control-initialized Gaussian mixture model over base channel intensities.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@127 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-21 17:58:50 +00:00
kiran 3e350006e0 Added a directory to house some Illumina output parsers. Hopefully this will be merged back into Picard at some point.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@126 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-21 17:55:56 +00:00
asivache 497eea2e5c minor changes and shuffling code around; also, now when realigned piles are printed they are sorted by start position
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@125 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-21 17:43:49 +00:00
jmaguire 0ea44a5805 1st draft of support for an file containing a list of intervals.
Appears to work, but inefficient:
At each reference location, the entire list of intervals is linear searched. 

Instead we need to have the intervals sorted, and simply seek forward from interval to interval.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@124 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-21 16:07:32 +00:00
hanna 1fcf4c0cbf Update picard to work with new samtools.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@123 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 21:51:26 +00:00
jmaguire 5dca560c3c A bunch of refactoring, and more on the way.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@122 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 21:31:07 +00:00
hanna b806a9cf68 Updated for new version of samtools, which returns a sequence dictionary
rather than a simple list of sequences.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@121 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 20:38:24 +00:00
hanna 6e2d939905 Added subversion rev 180 of the sam library.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@120 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 20:17:51 +00:00
ebanks c5433a3120 dumps out base qualities per position for use in making boxplots
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@119 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 17:01:18 +00:00
jmaguire 1161c261ac made all data members public.
switched logOddsVarRef to LOD.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@118 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 16:44:17 +00:00
depristo 9b5e5e06f9 Now supports checking that the input files exist and are good
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@117 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 16:40:54 +00:00
ebanks f3f1b47808 deal with reverse complemented reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@115 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 16:01:49 +00:00
asivache 9ec96414c7 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@114 348d0f76-0448-11de-a6fe-93d51630548a 2009-03-20 15:54:29 +00:00
depristo 322f4b944f Better stress test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@113 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 15:52:54 +00:00
asivache 3565b50ff5 main class (argument processing and traversing the reference) and implementation of all the Receiver functionality for building read piles over indels
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@112 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 05:18:04 +00:00
asivache 4c3b92b860 comparator for interval objects
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@111 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 05:15:13 +00:00
asivache f810412d75 equals(), hashCode() updated/added, also a few minor changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@110 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 05:13:07 +00:00
asivache 4badd54216 Indel also implements Interval interface but has its quirks
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@109 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 05:11:17 +00:00
asivache 501e92d441 an interface for an interval object and simple minimum implementation; note: in contrast to arachne, this is closed interval
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@108 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 05:09:56 +00:00
asivache 29d2d460f3 a trivial interface and even more trivial implementations that do nothing (ignore the data they receive)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@107 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 05:08:15 +00:00
depristo b83c8319c7 Crushed subtle and potentially insidous bug in seeking within the fasta; a beer for anyone who can tell me the situation where this might arise...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@106 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-20 00:07:06 +00:00
depristo 34ee48fd82 Fixing output printing issues in the code, as well as adding more safety checks
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@105 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-19 23:02:49 +00:00
hanna 6fdd622160 Describe how GATK finds walkers. Change the example to avoid copying the class file into the walkers directory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@104 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-19 22:41:12 +00:00
hanna 104e2811ec Configure the plugin directory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@103 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-19 22:12:25 +00:00
andrewk 6bcdac5c62 Restructured AlleleFrequency classes into 3 classes: AlleleFrequencyWalker, AlleleFrequencyMetricsWalker, AlleleFrequencyEstimate. AlleleFrequencyMetricsWalker class now calls mapper function of AlleleFrequencyWalker and works with the result. AlleleFrequencyEstimate is now a separate class instead of a subclass of AlleleFrequencyWalker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@102 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-19 22:06:01 +00:00
hanna 41fec1565c Hello, world! for GATK.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@101 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-19 21:46:22 +00:00
aaron 7bc45b68aa Added dependences on two libraries: the Colt package, which is a collection of high performance computing libraries from CERN; and Log4j, which will be our new logging platform.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@100 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-19 16:16:31 +00:00
andrewk 5fa99f430e One line format is useable and two levels of debug output are available (debug = 1: one line format, debug = 2: table of sampled probs for each locus). Class AlleleFrequencyMetrics computes %dbSNP and frequency of SNPs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@99 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-19 15:05:05 +00:00
depristo f1034f3dfd Stress Test utility for pushing the GATK to its limits. Takes a list of sam files and runs Analyses on them all, optionally in the queue
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@98 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-19 03:15:00 +00:00
hanna 4242dba295 Remove endless iterator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@97 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 23:53:40 +00:00
hanna 225ea64bd9 Moved extra walkers at Mark's request.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@96 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 23:52:08 +00:00
hanna ffb6f8f5da Move the basic gatk framework into the core subtree.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@95 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 23:39:00 +00:00
asivache 69316f1873 removed unused import statement
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@94 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 21:56:15 +00:00
asivache 875272e5c5 moved counted object to utils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@93 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 21:54:04 +00:00
asivache e09af2ef70 changed variable declaration from concrete class to interface
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@92 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 21:50:47 +00:00
asivache 708ada3e99 an accessory for CountedObject: builds a comparator for CountedObject<T> given a comparator for T; compares the underlying objects T themselves, *not* the associated counters
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@91 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 21:45:54 +00:00
asivache 37101045af a simple wrapper class; less overhead than keeping a separate Integer counter object and going through object reallocation and/or autoboxing on each counter increment
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@90 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 21:44:30 +00:00
ebanks 45d2a9acd8 Added walker to print out a histogram of where mismatches occur in alignments
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@89 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 19:46:42 +00:00
hanna 1096bbd4d9 Moved build.xml, ivy.xml and settings to root of Sting repository.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@88 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-18 19:13:19 +00:00