Commit Graph

909 Commits (05c5659053dba6a37ee04e54435b548e2abfc729)

Author SHA1 Message Date
aaron d58eeb7539 Don't cry wolf: only one warning is now emitted, instead of tons of warnings.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1139 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 13:50:37 +00:00
hanna a3e0ec20c4 Kill the TraverseByLocusWindows traversal. TraverseLocusWindows will take its place.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1138 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 13:46:35 +00:00
hanna 93da64db10 Update naming for consistency.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1136 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 22:03:21 +00:00
hanna e93f751bd7 First step in replacing the Hello, World! document. Revamped the HelloWalker and checked it into the source tree, created a special build file for it, and added it to the packaging tool.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1135 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 21:59:54 +00:00
ebanks 8d3dc57c3d Commit to emit in sorted order so we don't have to use /tmp
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1133 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 19:47:15 +00:00
aaron f5cba5a6bb Fixed genome loc to be immutable, the only way to now change it's values is through the GenomeLocParser.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1132 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 19:17:24 +00:00
asivache 177d6d00b8 added setContigIndex(). NOTE: both setContig() and setContigIndex are UNSAFE as one does not automatically involve updating the other, and there's also no validation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1130 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 17:40:37 +00:00
depristo 9fca79ed62 Read groups are now sorted in the output data, for convenience
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1129 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 16:50:44 +00:00
ebanks 08df4771c8 count X/N/etc. as mismatches for the NM attribute in the BAMs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1127 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 16:08:55 +00:00
kiran d412c5dc2f Updated to use SecondaryBaseAnnotator class.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1126 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 16:08:43 +00:00
kiran e3cdf7ef4b A single class that can be handed reads for training and basecalling. When in training mode, we accumulate no more than 10000 reads and always replace the lowest-quality reads with superior quality reads. Thus, the training set always contains 10000 of the best reads available. After training is complete, the class can be interrogated to return the SQ tag for a given RawRead object.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1125 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 16:03:15 +00:00
ebanks 8aa3b65e7f fix to guarantee emission in sorted order
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1122 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 13:48:41 +00:00
aaron 03f8177a53 When you get the reference string for a read that is mapped partially off the end of a contig, the string is masked with X's for base positions without corresponding reference positions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1121 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 20:51:55 +00:00
aaron 1dcababad1 a fix to make the test run
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1120 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 20:24:32 +00:00
jmaguire a17bf145f6 fix to respond to the change in IndelLikelihood constructor.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1119 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 19:05:33 +00:00
depristo 7ecc43e9a7 Fixed subtle null ptr exception discovered by Kiran. Now deals with the rare situation where you have only say Q28 bases at dbSNP sites, so you fail in the Table recalibration step with a null pointer error into the data structure indexed by quality score. If you are Q score above those seen before you aren't modified in any way.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1118 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 18:57:42 +00:00
ebanks 95e2ae0171 Deal with reads whose ends are aligned off the end of a chromosome.
Includes update to ignore non-ATCG bases (not just 'N')
(Also, create a BWA dir for future work)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1117 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:50:05 +00:00
jmaguire 65a788f18a Added a ROD (SangerSNP) for parsing the Sanger's chr20 pilot1 SNP calls.
Some doodling around with indel calling in an EM context.
 



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1116 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:32:12 +00:00
asivache ceeeec13b8 Computes a vector of numbers of reads falling into successive intervals of specified length (e.g. numbers of reads per every 1Mbase)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1115 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:12:21 +00:00
ebanks eb74b16e39 updated what constitutes removing entropy
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1113 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-26 18:29:00 +00:00
aaron d7d4298917 Some files to support generic genotype outputing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1112 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-26 15:43:41 +00:00
asivache 1a97c86f95 don't crash when an unmapped read is encountered, just write it into the output file, it should be ok
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1111 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-26 15:33:59 +00:00
hanna 491ed70b44 TraverseByLocusWindow -- asstd bug fixes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1109 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:51:38 +00:00
depristo 5289230eb8 Version 0.2.1 (released) of the TableRecalibrator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1108 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:50:55 +00:00
asivache 73caf5db15 This is, strictly speaking, NOT a GATK module. Standalone, picard-level executable except that it uses couple of gatk utils (GenomeLoc). Remaps alignments from cutom reference (such as transcritome, hyb-sel etc) onto the 'master' reference
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1107 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:04:18 +00:00
kiran ee2af3b423 I committed this too soon... reverting...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1106 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 20:49:12 +00:00
hanna ad3a3aa350 First pass at passing lists of files / lists of interval arguments work. Note that the interval
ROD system will throw up its hands and not deal with intervals at all if multiple interval files 
are passed in (see JIRA GSA-95). 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1105 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 20:44:23 +00:00
kiran 23680a9a16 Replaced an expensive sort with an inexpensive direct computation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1104 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 20:25:12 +00:00
ebanks 83816fb801 Stop using the annoying refIterator (temp change until new traversal is green lighted)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1103 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 20:05:39 +00:00
aaron 0c3aabd1c5 logger output should be less verbose by default. Also fixed a printout in my read validation walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1102 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 19:47:29 +00:00
kcibul 11d83ac7d0 pushing up to test on unix box
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1101 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 19:00:48 +00:00
ebanks 0d9041380d remove printouts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1100 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:54:14 +00:00
aaron 0a16519aa2 a couple of additions to the tests, plus a change to the artificial resource pool to support the queryContained flag
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1099 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:30:32 +00:00
jmaguire 2c97c5e873 Compute a simple histogram of depth of coverage.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1098 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:30:11 +00:00
hanna 102b38c055 Sketch of new version of TraverseByLocusWindow, and a flag to conditionally turn it on.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1097 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:20:56 +00:00
aaron 4e04370f14 forgot a file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1096 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 17:56:17 +00:00
aaron 5b1c23a7f2 changes to fix and test the interval based traversals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1095 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 17:54:15 +00:00
kcibul 3b24264c2b incorporating skew check, further output of metrics
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1094 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 16:01:07 +00:00
ebanks ea2426dcd0 one more change needed to commit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1093 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 15:09:53 +00:00
ebanks 347608cfe0 remove hacked traversal in preparation for move to Matt's new one
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1091 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 14:32:05 +00:00
ebanks 940d75171a Big cleaner changes:
1. Added a Walker to merge intervals before cleaning
2. (Almost) all Walkers can filter out 454 reads (and do by default)
3. Got rid of -all command and related pieces (time to switch to CleanedReadsInjector)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1090 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 14:31:24 +00:00
asivache 3cb6d7048e don't freak out if two reference intervals a custom contig is built of are strictly adjacent; instead politely warn user that her data suck and proceed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1089 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 19:08:10 +00:00
asivache d4f3ca1a10 A utility class for keeping the mapping from 'custom' reference (e.g. transcriptome) onto the 'master' reference (e.g. whole genome), and for remapping SAM records from the former onto the latter. It's Arachne's BaitMultiMap, pretty much
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1088 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 18:16:15 +00:00
kiran 69dc502174 I forgot that this depends on BoundedScoringSet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1087 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 17:18:53 +00:00
aaron 61ce4e5983 quick doc change
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1086 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 16:35:46 +00:00
asivache a9c30c5fcc added -nosort cmdline flag; if specified, the output writer does not attempt to sort reads on the fly (sorting involves use of sorting collection backed up by temporary disk storage and can lead to crashes if temp size is low and/or filesystem is not behaving). Output can be later sorted externally by samtools
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1085 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 15:58:00 +00:00
kiran 7b5d8d7604 Changed the intensities array order from cycle,channel to channel,cycle. This, I'm told, is a far more efficient allocation strategy.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1084 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 15:41:06 +00:00
kiran 3112302ec9 A priority-queue-like container that allows you to add a specified number of elements. When the limit has been reached, new additions replace the lower scoring elements.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1083 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 15:39:47 +00:00
depristo 0a50f2e160 Updated and near final version of tabular recalibration system. Uses 'yates' correction for low-occupancy quality bins. Faster and more robust handling of input and output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1082 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 03:52:12 +00:00
hanna ef546868bf Pooling of unmapped reads -- improves runtime of files with tons of unmapped reads by an order of magnitude.
Desperately needs cleanup.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1080 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 23:48:06 +00:00