Commit Graph

23 Commits (092a754071af6c67f57d4f273d58d4056386db25)

Author SHA1 Message Date
aaron 37efd78c7e fixed the logger call so we get output that indicates this class generated the message
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@911 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 15:02:17 +00:00
ebanks 36fb6ca3c5 Allow user to specify the compression to be used when writing out BAM files.
Updated most of the walkers to reflect this change.
Now it won't take forever to write BAMs!



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@909 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 08:48:34 +00:00
asivache d601548d53 added reallocate(int[] orig_array, int new_size) and int[] indexOfAll(String s, int ch); the former is self-explanatory, while the latter returns array of indices of all occurences of ch in the specified string
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@856 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 20:15:00 +00:00
hanna 5e8c08ee63 Update to latest version of picard. Change imports in all classes dependent on picard public from import edu.mit.broad.picard... to import net.sf.picard...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@849 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 20:13:01 +00:00
depristo d261459c48 Useful function to create a string with N copies of a same char
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@784 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:23:52 +00:00
hanna 23e9e29964 Changed reads traversals from providing a LocusContext from which the reference sequence
could be extracted to a char[] containing the reference bases.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@657 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:45:11 +00:00
depristo e848f34896 countOccurances of char in string and max of a list of bytes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@622 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:03:49 +00:00
jmaguire 6cef8bd76c added k-best quality path enumeration.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@497 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 20:26:51 +00:00
jmaguire af6788fa3d Misc:
1. Added logGamma function to utils
2. Required asserts to be enabled in the allele caller (run with java -ea)
3. put checks and asserts of NaN and Infinity in AlleleFrequencyEstimate
4. Added option FRACTIONAL_COUNTS to the pooled caller (not working right yet)

AlleleFrequencyWalker:
5. Made FORCE_1BASE_PROBS not static in AlleleFrequencyWalker (an argument should never be static! Jeez.)
6. changed quality_precision to be 1e-4 (Q40)
7. don't adjust by quality_precision unless the qual is actually zero.
8. added more asserts for NaN and Infinity
9. put in a correction for zero probs in P_D_q
10. changed pG to be hardy-weinberg in the presence of an allele frequency prior (duh)
11. rewrote binomialProb() to not overflow on deep coverage
12. rewrote nchoosek() to behave right on deep coverage
13. put in some binomailProb() tests in the main() routine (they come out right when compared with R)

Hunt for loci where 4bp should change things:
14. added FindNonrandomSecondBestBasePiles walker.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@471 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-19 15:35:07 +00:00
asivache df5aae5ed4 got read of a couple of warnings and added percentage(x,base) methods
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@462 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 15:15:21 +00:00
depristo 72a3d84ed2 General purpose pileup code -- you can use these features to obtain detailed pileup data from reads and offsets. Useful for all pileup based walkers. Expanded support for rodSAMPileup to enable the new ValidatingPileupWalker, which takes a samtools pileup output and checks that GATK gives identical output as samtools on a per base and per qual pileup. It's going to be a very useful validation tool.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@418 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 22:13:10 +00:00
kiran 998fad76c6 Some utility methods for creating pileups of secondary bases and secondary quals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@397 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 13:57:54 +00:00
depristo bb666ce392 Added mappingQualPileup function for use in the verbose mode of Pileup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@391 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:51:26 +00:00
jmaguire f39092526d Added function RandomSubset
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@379 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 12:14:53 +00:00
depristo 00722e19bc The system now requires a dictionary file for a fasta file, or it throws an error. You can't just operate without a sequence dictionary any longer. We will transition to a GenomeLoc system that assumes a dictionary is available.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@319 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 22:19:54 +00:00
ebanks 3f75fc4e83 Unfortunately, because BWA occasionally outputs crazy reads, we need
to make sure not to have an ArrayIndexOutOfBoundsException thrown.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@297 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 03:51:35 +00:00
ebanks 42eb356782 1. modifed by read traversals with indexes to be more general
2. GenomeLocs for reads should have ends spanning the read
   (moved it to GenomeLoc from Utils)
3. Got rid of those stupid unmappable characters from comments in various files


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@289 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 18:24:08 +00:00
depristo d952790258 GFF now parses attributes correctly and efficiently. Slightly better interface to Utils.join
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@253 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 22:54:38 +00:00
depristo 385736469c High performance pileup code and utilities
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@242 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 00:47:47 +00:00
depristo d7c0bcc223 Reorganized GenomeLoc code to more clearly and better use the picard SequenceDictionary information.
All GenomeLoc[] are not ArrayList<GenomeLoc> for clarity and consistency
Parsing now recursively merges contiguous elements chr1:1-10;chr1:11-20 => chr1:1-20
Added support for TraversingByLoci over all reference positions specified by the provided location array.  System dynamically determines which traversal system to use.
Pileup now marks, very clearly, reference positions without covered reads.
Made changes around the codebase to deal with new GenomeLoc structure.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@218 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-28 20:37:27 +00:00
aaron 230c1ad161 moved a bunch of files over to the logging system. In some cases I ballparked the severity level of an error, so if you see something wrong feel free to make changes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@211 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-27 14:02:55 +00:00
depristo d11bb0fc64 Added xReadLines class to utils. It is a iterator<string> and iterable<string> so you can easily read all lines from a file. It's been used to simplify the code to process intervals, and will be used to add merging data support to the system...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@187 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 15:17:38 +00:00
hanna 63cd1fe201 Push core / playground lower into the tree.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@160 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-23 23:19:54 +00:00