Commit Graph

707 Commits (ec68ae3bc5a0b3a1fd753675fa0fd3b6eb30d297)

Author SHA1 Message Date
andrewk 44673b2dce Removed a debugging println that was accidentally checked in
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1348 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 22:23:27 +00:00
andrewk 845488ff94 VariantEval now decides whether a variant is not confidently called using BestVsNetxBest if genotypes are being evaluated and BestVsRef if not (variant discovery only). Also, the absolute value of the BestVsRef LOD (getVariantionConfidence) is used so that confident reference calls (if the GELI has output them) will show up in the final table as reference calls rather than no calls.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1347 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 21:54:06 +00:00
andrewk fdc7cc555b Removed extra column name from geliHeaderString that was mislabeling the 10 genotype likelihoods by shifting them over by onex
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1345 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 21:42:02 +00:00
aaron 0087234ed7 small code cleanup, a couple of little changes to SSGGenotypeCall
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1343 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 19:47:37 +00:00
ebanks fbc7d44bc7 don't allow users to input priors anymore; they should be using heterozygosity and having the SSG calculate priors.
Note that nothing was changed for dnSNP/hapmap priors (not sure what we want to do with these yet - any thoughts?)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1342 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 19:10:33 +00:00
ebanks b282635b05 Complete reworking of Fisher's exact test for strand bias:
- fixed math bug (pValue needs to be initialized to pCutoff, not 0)
- perform factorial calculations in log space so that huge numbers don't explode
- cache factorial calculations so that each value needs to be computed just once for any given instance of the filter

I've tested it against R and it has held up so far...



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1341 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 18:52:13 +00:00
aaron 4033c718d2 moving some code around for better organizations, some fixes to the fields out of SSG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1340 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 15:09:43 +00:00
ebanks 4366ce16e0 Made sure all RODs have a (good) toString() method - and use it in the Venn walker. (thanks, Mark)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1339 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 14:53:27 +00:00
aaron 9cd53d3273 some initial changes from the first review of the genotype redesign, more to come.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1338 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 07:04:05 +00:00
hanna 5429b4d4a8 A bit of reorganization to help with more flexible output streams. Pushed construction of data
sources and post-construction validation back into the GATKEngine, leaving the MicroScheduler
to just microschedule.  


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1336 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 23:00:15 +00:00
aaron bca894ebce Adding the intial changes for the new Genotyping interface. The bullet points are:
- SSG is much simpler now
- GeliText has been added as a GenotypeWriter
- AlleleFrequencyWalker will be deleted when I untangle the AlleleMetric's dependance on it
- GenotypeLikelihoods now implements GenotypeGenerator, but could still use cleanup

There is still a lot more work to do, but this is a good initial check-in.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1335 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 19:43:59 +00:00
kiran c5c11d5d1c First attempt at modifying the VFW interfaces to support direct emission of relevant training data per feature and exclusion criterion. This way, you could run the program once, get the training sets, and then feed that training set back to the filters and have them automatically choose the optimal thresholds for themselves. This current version is pretty ugly right now...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1334 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 19:29:03 +00:00
ebanks 3554897222 allow filters to specify whether they want to work with mapping quality zero reads; the VariantFiltrationWalker passes in the appropriate contextual reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1333 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 17:38:15 +00:00
hanna 7a13647c35 Support for specifying SAMFileReaders and SAMFileWriters as @Arguments directly. *Very*
rough initial implementation, but should provide enough support so that people can stop
creating SAMFileWriters in reduceInit.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1332 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 16:11:45 +00:00
depristo 56f769f2ce Output improvements to GenotypeConcordance calculations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1331 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 12:54:46 +00:00
ebanks 72dda0b85c Fixed calculations for Mark
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1330 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 03:21:43 +00:00
ebanks f0378db9b7 added accuracy numbers
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1329 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 01:38:33 +00:00
ebanks a5a56f1315 At this point, we are convinced that the new priors are the way to go...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1328 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 17:25:25 +00:00
depristo df4fd498c5 Improvements and bug fixes galore. (1) Now properly handles Q0 bases, filtering them out, you can disable this if you need to (2) support for three-state base probabilities (see email), which is disabled by default (still experimental) but appears to be more emppowered to detect variants (see email too)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1327 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 13:21:38 +00:00
depristo 46643d3724 Improvements and bug fixes galore. (1) Now properly handles Q0 bases, filtering them out, you can disable this if you need to (2) support for three-state base probabilities (see email), which is disabled by default (still experimental) but appears to be more emppowered to detect variants (see email too)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1326 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 13:21:27 +00:00
ebanks 3c4410f104 -add basic indel metrics to variant eval
-variants need a length method (can't assume it's a SNP)!


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1324 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 03:25:03 +00:00
kcibul 1d6d99ed9c walk by reference
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1323 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-27 20:21:04 +00:00
ebanks 089ae85be7 1. output grep-able strings for genotype eval
2. free DB coverage from isSNP restriction


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1322 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-27 17:36:59 +00:00
kcibul 1bca9409a4 calculate freestanding intervals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1321 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-27 16:40:27 +00:00
asivache 2499c09256 added minIndelCount (short: minCnt) command line argument. The call is made only if the number of reads supporting the consensus indel is equal or greater than the specified value (default: 0, so only minFraction filter is on in default runs!)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1320 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-27 15:22:51 +00:00
ebanks 73ddf21bb7 SNPs no longer fail this filter if they are actually hom in reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1319 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-27 15:20:43 +00:00
asivache f2b3fa83ac fix for another bug found by Eric: some indels were printed into the output stream twice (when there's another indel within MISMATCH_WINDOW bases and that other indel requires delayed print in order to accumulate coverage)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1318 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-27 15:07:07 +00:00
asivache 5eca4c353c IndelGenotyper now uses GATK::getMergedReadGroupsByReaders() to sort out which read in the merged stream is for normal, and which is for tumor (in --somatic mode, apparently)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1316 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 23:01:18 +00:00
asivache 64221907a2 fixed a bug found by Eric: genotyper would crash in the case of an indel too close to the window end, with the next read mapping sufficiently far away on the ref
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1313 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 21:00:31 +00:00
hanna df44bdce7d Retire the pooled caller...its been eclipsed by other walkers in the tree.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1310 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 14:49:03 +00:00
kiran 884806fc16 Broken and unused. It goes away now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1309 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 14:26:52 +00:00
ebanks d044681fbe change paths to new ones
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1308 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 07:28:43 +00:00
ebanks 59f0c00d77 -set indel cleaning walkers to be in core package
-move Andrey's alignment utility classes to core


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1307 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 05:23:29 +00:00
kiran bb20462a7c A better way: down-scale second-base ratios until the infinities disappear. This way, high-coverage sites don't cause binomialProbability to explode.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1306 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 03:02:00 +00:00
kiran 038cbcf80e If the result from the secondary-base test is 0.0, replace the result with a minimum likelihood such that the log-likelihood doesn't underflow.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1303 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 20:59:52 +00:00
kiran 093550a3f2 Removed secondary-base test from SingleSampleGenotyper. It now lives in the variant filtration system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1302 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 20:58:41 +00:00
ebanks 477502338f moved major indel cleaning pieces to core (yippee!)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1301 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 19:59:51 +00:00
ebanks 4efe26c59a Major: allow genotyper to optionally output in 1KG format, including outputting the samples in which indels are found.
Minor: refactor 454 filtering


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1300 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 19:53:51 +00:00
ebanks f8b1dbe3b3 getBestGenotype() does not necessarily return hets in alphabetical order;
the string (unfortunately) needs to be sorted for lookup in the table (otherwise we throw a NullPointerException)
TO DO: have the table be smarter instead of sorting each genotype string


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1298 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 01:58:47 +00:00
ebanks ee8ed534e0 print full genotype for alt allele
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1297 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 01:35:23 +00:00
depristo 9c12c02768 AlleleBalance and on/off primary base filters -- version 0.0.1 -- for experimental use only
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1294 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-22 17:54:44 +00:00
ebanks c54fd1da09 Beautify the genotype concordance printouts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1291 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-22 02:53:02 +00:00
hanna 1843684cd2 Cleanup: GATKEngine no longer needs to be lazy loaded, b/c the plugin directory no longer exists.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1287 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 18:50:51 +00:00
hanna b43925c01e Switched to Reflections (http://code.google.com/p/reflections/) project for
inspecting the source tree and loading walkers, rather than trying to roll
our own by hand.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1286 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 18:32:22 +00:00
kiran 436a196e2b Bug fixes to support hapmap genotyping concordance.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1285 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 16:20:10 +00:00
depristo 7e04313b4e Bug fixes and improvements to CoverageHistogram. Now displays the frequency of the bin. Also correctly prints out the last element in the coverage histogram (<= vs. <)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1284 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 11:55:05 +00:00
aaron b4adb5133a GLF rod as a AllelicVariant object.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1282 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 00:55:52 +00:00
kiran f314ef8d84 Features and exclusion criteria are now instantiated in VariantFiltrationWalker's initialize() method, rather than in every map() call. This means the features and exclusion criteria will only ever be initialized once.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1281 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-20 22:47:21 +00:00
mmelgar 8da754eb4e First implementation of a primary base filter. Assumes distribution of on/off bases is distributed according to a binomial.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1278 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-17 18:43:35 +00:00
ebanks 24ebfee604 don't print traversal stats
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1277 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-17 16:13:28 +00:00
ebanks f978b04633 A very simple walker to print out (using the ROD's toString method) all of
the RODs it sees.  This is the easiest solution to get around the (temporary)
bug of reads being seen multiple times by reads walkers when close intervals
are passed to them (i.e. process full contigs and then use a ref walker to
filter the ones within your intervals of choice)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1273 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-17 14:03:34 +00:00
hanna df1c61e049 Re-add the plugin path.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1271 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-16 22:48:44 +00:00
hanna 7c30c30d26 Cleaned up some duplicate code in preparation for making plugin dir configurable.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1270 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-16 22:02:21 +00:00
depristo 31f3f466ca Improvements to support GLF generation -- now correctly handles GLF
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1269 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-16 21:10:39 +00:00
depristo 0548026a2e Now understanding GLFs for calculating genotyping results like callable bases, as well as avoids emitting stupid amounts of data when doing a genotype evaluation (i.e., ignores non-SNP() calls)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1267 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-16 21:03:26 +00:00
depristo c5f6ab3dd5 CoverageHistogram now sees 0 coverage sites
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1266 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-16 20:58:41 +00:00
ebanks 8bc0832215 Generate chip concordance table.
This should work, although I need to test it with some real GLFs


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1265 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-16 17:44:47 +00:00
kcibul e1055bcc4c moving to new external repository
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1261 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 20:46:08 +00:00
kcibul 4a730adfc1 committing latest changes before moving repositories
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1260 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 20:44:02 +00:00
ebanks a245ee32fa A walker to split 2 call sets into their intersection/union/disjoint (sub)sets.
Yes, the name is retarded, but I'm under pressure here...


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1258 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 20:20:47 +00:00
kcibul 00d49976fb committing latest changes before moving repositories
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1255 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 18:41:52 +00:00
aaron 9ecb3e0015 adding GLFRods with tests and some other code changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1251 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 15:30:19 +00:00
hanna c25f84a01c Regression: we lost our hack to work around BAM files with index problems (affects BAM files created before 23 Apr 2009 and traversed by interval). Added the hack back in, along with a much more explicit comment about why its there.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1248 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 14:41:37 +00:00
depristo 1798aff01b VariantEval now understands the difference between a population-level analysis and a genotype analysis, and handles both. All analyses annotated as supporting one or the other or both. Preparation for genotype chip concordance calculations as well as called sites, etc analyses
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1247 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 14:07:13 +00:00
depristo 84d407ff3f Fixing odd merge problem with VariantEval -- better cluster analysis (no cumsum), rodVariant is now an AllelicVariant
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1239 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 18:53:27 +00:00
aaron 7d755a4c90 GenotypeLikelihoods doesn't emit metrics, they don't make sense
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1236 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 17:22:28 +00:00
aaron 01fc8da270 adding the GenotypeLikelihoodsWalker, which generates GLF genotype likelihoods that are pretty much identical to the samtools calls.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1235 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 16:57:18 +00:00
aaron 36819ed908 Initial changes to the SSG to output GLF by default
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1231 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 08:46:04 +00:00
ebanks a1d33f8791 -Added walker to dump strand test results to file
-Refactored strand filter to handle calls from the walker


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1229 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 01:56:50 +00:00
ebanks 52659d02d4 ignore unmapped reads in all the indel walkers (since they're giving me overhead issues)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1224 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 16:51:11 +00:00
ebanks 4c02607297 genotyper also needs to have 454 reads filtered out
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1221 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-10 23:19:28 +00:00
ebanks dea72c576e use the filter to ignore 454 reads in the traversal to speed up cleaning
(since there's less area to actually clean against)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1220 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-10 18:34:44 +00:00
asivache 1401606344 move warning about strictly adjacent intervals in a contig from 'remap' to 'read', so it is issued only once
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1218 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-10 17:58:11 +00:00
asivache e01d37024a now updates mapping quality (to an arbitrary chosen value of 37 if the resulting mapping is unique) and X0, X1 tags after remapping (in REDUCE mode)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1216 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-10 16:40:52 +00:00
hanna 03e1713988 Better support for specifying read filters to apply directly from the walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1212 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 23:59:53 +00:00
aaron d86717db93 Refactoring of the traversal engine base class, I removed a lot of old code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1209 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 21:57:00 +00:00
kcibul bc44e08225 refactored output logic
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1204 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 16:13:01 +00:00
ebanks 3fe7104963 Added walker to filter out clustered SNPs from a call set
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1203 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 03:16:27 +00:00
andrewk c8fcecbc6f Added ParseDCCSequenceData.py to repository and made changes that allow an analysis of quantity of sequence data by platform and project, moved table / record system to a new module called FlatFileTable.py and built that into ParseDCCSequenceData and CoverageEval.py; changed lod threshold in CoverageEvalWalker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1201 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 22:04:26 +00:00
hanna 433ad1f060 Cleanup...deprecate FastaSequenceFile2 in favor of IndexedFastaSequenceFile or ReferenceSequenceFile from Picard, depending on the application.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1196 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 18:49:08 +00:00
jmaguire 0a67386525 .
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1195 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 16:59:36 +00:00
kiran c78a72e775 Applies Fisher's Exact Test to determine whether there's a strand bias and, if so, filters the call out.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1193 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 16:14:11 +00:00
kiran b211f500a3 Applies secondary base feature to variants.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1192 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 16:13:29 +00:00
kiran 6e31057e6b Some changes involving output of marginal calls to different, per-filter files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1191 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 16:12:57 +00:00
andrewk d3daecfc4d Added unit tests for function in ListUtils to randomly sample lists with replacement, updated AlleleFrequencyEstimate to provide a callType of HomRef, HetSNP, HomSNP, update indices in CoverageEval.py, and made a lot of changes to CoverageWalker biggest one being that it directly calls SingleSampleGenotyper instead of implementing some parts of SSG itself.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1189 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 02:05:40 +00:00
jmaguire 1db15ee468 made some things protected so that I can inherit them in MultiSampleCallerAccuracyTest
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1185 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 15:50:28 +00:00
jmaguire 1fa71aa31d Now outputs stats. Doesn't do the downsampling thing because I think I'll have enough counts.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1184 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 15:29:31 +00:00
depristo b9d533042e Two-tailed HardyWeinberg test implemented. VariantEval now separate violations from summary outputs for clarity; Fixing problems with CovariateCounterTest and TabularRodTest
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1177 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 22:02:04 +00:00
mmelgar 6580211c2a First version of depth of coverage filter. Right now it takes in a maximum coverage threshold given by the user.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1175 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 18:22:46 +00:00
ebanks fac7ac5142 Don't print out 0 coverage (which is always 0)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1174 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 17:44:32 +00:00
kcibul 000d92a545 added gc calculation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1172 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 13:07:04 +00:00
ebanks 338cdbebad deal with screwy solid reads in the cleaner (no cigar strings)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1171 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-05 16:49:58 +00:00
jmaguire 8bcbf7f18a First draft of multi sample caller accuracy test.
Doesn't do it's job yet but the pieces are in place.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1170 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-05 16:29:13 +00:00
jmaguire 4019cd2bd7 Added ROD for parsing hapmap3 genotype files.
Tweak to TabularROD to allow HapMapGenotypeROD to work.
Added HapMapGenotypeROD to list of RODs in ReferenceOrderedData.java.
Modified MultiSampleCaller to return a single object with most of the relvant information.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1169 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-05 16:28:24 +00:00
kcibul be2f8478c0 added supression of failure messages
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1164 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-03 15:19:37 +00:00
kcibul 25c30b12bb added MAF-style output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1163 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-03 15:10:19 +00:00
andrewk dcb8892568 Lot of code for coverage evaluation tools including first version of python script to evaluate the downsampled SSG callls made and the java code to make all the calls at Hapmap chip sites at various downsampling levels; ListUtils contains functions for randomnly subsetting lists (with replacement) which are useful for subsetting the same elements in both the reads and the offsets lists of a LocusWalker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1162 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-03 08:07:02 +00:00
asivache d603145cb0 Meaning of input arguments has CHANGED: minFraction is now a minimum fraction of CONSENSUS indel observation, out of all reads covering the site, required to make the call. minConsensusFraction is still the minimum fraction of CONSENSUS indel observation out of all indel observations at the site
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1160 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 20:38:10 +00:00
kcibul 6a25f0b9c5 refactored into new package
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1158 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 14:37:54 +00:00
asivache c2e5a68aaf output format changed in --verbose --somatic mode: now also prints the <#reads with indels>/<coverage> for normal samples, rather than only for the tumor; also, code cleaned up a little
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1149 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 20:56:16 +00:00
andrewk 4cbf069de1 First version of coverage evaluation tool
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1148 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 20:52:25 +00:00
ebanks 76fd4b3848 deal with different contigs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1146 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 19:17:27 +00:00
ebanks 20fab507a8 Choose the REF if it scores equal to consensus!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1145 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 18:54:27 +00:00
ebanks 5a5103cfd2 Heads up, everyone: command-line args no longer need to be public.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1143 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 16:09:22 +00:00
hanna 93da64db10 Update naming for consistency.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1136 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 22:03:21 +00:00
ebanks 8d3dc57c3d Commit to emit in sorted order so we don't have to use /tmp
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1133 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 19:47:15 +00:00
aaron f5cba5a6bb Fixed genome loc to be immutable, the only way to now change it's values is through the GenomeLocParser.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1132 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 19:17:24 +00:00
ebanks 08df4771c8 count X/N/etc. as mismatches for the NM attribute in the BAMs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1127 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 16:08:55 +00:00
kiran d412c5dc2f Updated to use SecondaryBaseAnnotator class.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1126 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 16:08:43 +00:00
ebanks 8aa3b65e7f fix to guarantee emission in sorted order
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1122 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 13:48:41 +00:00
jmaguire a17bf145f6 fix to respond to the change in IndelLikelihood constructor.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1119 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 19:05:33 +00:00
ebanks 95e2ae0171 Deal with reads whose ends are aligned off the end of a chromosome.
Includes update to ignore non-ATCG bases (not just 'N')
(Also, create a BWA dir for future work)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1117 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:50:05 +00:00
jmaguire 65a788f18a Added a ROD (SangerSNP) for parsing the Sanger's chr20 pilot1 SNP calls.
Some doodling around with indel calling in an EM context.
 



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1116 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:32:12 +00:00
asivache ceeeec13b8 Computes a vector of numbers of reads falling into successive intervals of specified length (e.g. numbers of reads per every 1Mbase)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1115 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:12:21 +00:00
ebanks eb74b16e39 updated what constitutes removing entropy
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1113 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-26 18:29:00 +00:00
asivache 1a97c86f95 don't crash when an unmapped read is encountered, just write it into the output file, it should be ok
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1111 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-26 15:33:59 +00:00
depristo 5289230eb8 Version 0.2.1 (released) of the TableRecalibrator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1108 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:50:55 +00:00
asivache 73caf5db15 This is, strictly speaking, NOT a GATK module. Standalone, picard-level executable except that it uses couple of gatk utils (GenomeLoc). Remaps alignments from cutom reference (such as transcritome, hyb-sel etc) onto the 'master' reference
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1107 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:04:18 +00:00
hanna ad3a3aa350 First pass at passing lists of files / lists of interval arguments work. Note that the interval
ROD system will throw up its hands and not deal with intervals at all if multiple interval files 
are passed in (see JIRA GSA-95). 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1105 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 20:44:23 +00:00
aaron 0c3aabd1c5 logger output should be less verbose by default. Also fixed a printout in my read validation walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1102 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 19:47:29 +00:00
kcibul 11d83ac7d0 pushing up to test on unix box
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1101 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 19:00:48 +00:00
ebanks 0d9041380d remove printouts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1100 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:54:14 +00:00
jmaguire 2c97c5e873 Compute a simple histogram of depth of coverage.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1098 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:30:11 +00:00
kcibul 3b24264c2b incorporating skew check, further output of metrics
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1094 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 16:01:07 +00:00
ebanks 940d75171a Big cleaner changes:
1. Added a Walker to merge intervals before cleaning
2. (Almost) all Walkers can filter out 454 reads (and do by default)
3. Got rid of -all command and related pieces (time to switch to CleanedReadsInjector)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1090 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 14:31:24 +00:00
asivache 3cb6d7048e don't freak out if two reference intervals a custom contig is built of are strictly adjacent; instead politely warn user that her data suck and proceed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1089 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 19:08:10 +00:00
asivache d4f3ca1a10 A utility class for keeping the mapping from 'custom' reference (e.g. transcriptome) onto the 'master' reference (e.g. whole genome), and for remapping SAM records from the former onto the latter. It's Arachne's BaitMultiMap, pretty much
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1088 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 18:16:15 +00:00
kiran 69dc502174 I forgot that this depends on BoundedScoringSet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1087 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 17:18:53 +00:00
asivache a9c30c5fcc added -nosort cmdline flag; if specified, the output writer does not attempt to sort reads on the fly (sorting involves use of sorting collection backed up by temporary disk storage and can lead to crashes if temp size is low and/or filesystem is not behaving). Output can be later sorted externally by samtools
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1085 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 15:58:00 +00:00
kiran 3112302ec9 A priority-queue-like container that allows you to add a specified number of elements. When the limit has been reached, new additions replace the lower scoring elements.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1083 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 15:39:47 +00:00
asivache dfa2efbcf5 not crashing when refseq annotation track is not requested is a nice added feature
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1079 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 22:52:40 +00:00
kcibul eb999f880a incorporating skew check
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1078 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 19:51:51 +00:00
asivache 1339f3f3e3 make refseq annotation file an optional argument; if specified, indels will be annotated as genomic/utr/intron/coding (accidentally appearing 'unknowns' probably mean that there's something wrong with refseq annotations?)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1077 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 18:17:03 +00:00
aaron 9c0dba6979 Some quick documentation and typo changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1076 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 13:40:13 +00:00
ebanks cb9c6f18ef spelling fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1074 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 01:46:35 +00:00
kiran 630d9e6a37 Fixed a typo.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1073 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 21:37:46 +00:00
aaron 8b4d0412ca Changed the duplicate traversal over to the new style of traversal and plumbed into the genome analysis engine. Also added a CountDuplicates walker, to validate the engine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1072 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 21:11:18 +00:00
ebanks 9e25229014 use better entropy threshold and don't print out "new" SNPs (since they're just an antrifact of the low (arbitrary) threshold
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1070 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 15:30:08 +00:00
aaron bcb64d92e9 Aaron: 1, GenomeLoc: 0. I changed our GenomeLoc class, seperating the creation of a genome loc (with the reference setup) to a parser class. GenomeLoc now just represents the actual genomic postion. The constructors are now package-protected (to enforce using the parser), but we may want to expose some constructors in the future.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1069 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 14:39:41 +00:00
depristo 26eb362f52 Added novel / known split to variant eval. That is, emits all of the standard analyses on SNP partitioned into those known in the provided known db and those novel. Also fixed problem with counting bases within subsets
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1068 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 21:27:40 +00:00
ebanks a21c2a7e48 don't make mapping quality too high
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1066 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 04:51:42 +00:00
ebanks 686c8133ed massive change in the way the cleaner works, mostly revolving around the fact
that we no longer trust indels from the alignments (although we do use it as
a good alternate consensus possibility).
Other changes include better "greedy mode" performance and allowing the user
to have just the cleaned reads themselves be printed out (mostly for Matt's
CleanedReadInjector).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1065 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 03:56:59 +00:00
hanna dde52e33eb Cleanup of the cleaned read injector based on Eric's feedback.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1062 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 22:04:47 +00:00
kiran a0a3cf2f9f VariantFiltrationWalker can now apply specified exclusion tests after the feature tests. For a given variant, all reasons for exclusions are printed to screen.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1061 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 21:12:01 +00:00
jmaguire 58b132ee10 Eliminate redundant computation.
Still room for more optimization, but I called chr20 (60Mb) in a couple hours on the queue this morning.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1058 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 16:31:57 +00:00
jmaguire 3a1b58ca65 remove unused argument lodThreshold.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1057 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 12:40:12 +00:00
kiran 9a0151b7e1 Added an option to list all available feature classes and exit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1056 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 00:00:12 +00:00
kiran ed7afd8b70 Added javadocs. Now throws an exception if an unknown feature is specified. General cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1055 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 23:28:38 +00:00
kiran 284fd6a5fb VariantFiltrationWalker now inspects its parent package and determines the list of features that can be applied. Command-line specification of filters to run look at the simple names of these features and do a case-insensitive match to determine which features to apply. A new verbose mode allows the user to see how the likelihoods are changing with the application of each subsequent feature.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1054 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 22:45:36 +00:00
hanna af7a759ba4 Convert the somatic coverage tool to output from the packaging tool rather than from the dist target.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1050 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 21:29:30 +00:00
depristo 1bca144119 Moving things around
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1049 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 21:06:46 +00:00
depristo ca8a3bd85e Another temp checking for rearranging things
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1048 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 21:04:36 +00:00
kiran a4fa02f11c Moved output outside of for loop so I don't have 10 different versions of the same variant (though, now that I think of it, that's not necessarily a terrible thing for debugging...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1045 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 19:59:26 +00:00
kiran 768a16e791 An experimental, tile-parallel version of the secondary base annotator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1044 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 19:58:09 +00:00
kiran e26df45e8e Different features can now be specified by repeatedly supplying the -F "featurename:arguments" option.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1043 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 18:45:03 +00:00
kiran 7a921c908c Can now adjust the genotype likelihoods of a variant returned from the rod. This automatically causes the lodBtr, lodBtnb, and genotype to be recomputed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1041 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 07:26:37 +00:00
kiran 9a7cec7d2e Directory to house variant calling and filtration tools.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1040 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 07:20:38 +00:00
jmaguire 5992d88409 skip N's in the reference (rather than crash. doh!)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1039 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 23:22:35 +00:00
kiran 9ef391706c Added outputting of genotype posteriors to geli.calls file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1035 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 21:31:46 +00:00
kcibul 615572ea06 output to out... not System.out...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1034 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 20:43:10 +00:00
kcibul 673205ed5f additional output tweaking
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1028 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 15:37:38 +00:00
depristo 7d281296a7 Finishing checking for building
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1027 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 14:12:40 +00:00
depristo d1e25bfe88 Intermediate checkin for safety -- now compiles
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1026 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 13:16:55 +00:00
depristo 2250769a42 Intermediate checkin for safety -- do not use
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1025 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 13:07:19 +00:00
depristo 86c8c08375 Intermediate checkin for safety -- do not use
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1024 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 13:06:24 +00:00
aaron 6ee64c7e43 added changes to support alec toUnmappedRead seek. Huge improvements (orders of magnitude) in unmapped read performance.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1021 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 22:15:56 +00:00
jmaguire 4f6d26849f Behold MultiSampleCaller!
Complete re-write of PoolCaller algorithm, now basically beta quality code. 

Improvements over PoolCaller include:

	- more correct strand test
	- fractional counts from genotypes (which means no individual lod threshold needed)
	- signifigantly cleaner code; first beta-quality code I've written since BaitDesigner so long ago.
	- faster, less likely to crash!	




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1020 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 20:03:24 +00:00
aaron b11c5a7cd5 doing some read validation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1018 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 19:25:43 +00:00
asivache 010304fe44 bug: printing incorrect coordinates into output, finally fixed (?)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1017 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 18:08:56 +00:00
asivache 2259dc3a8f added filtering out indels with large levels of noise (mismatches) remaining in the close proximity; also a bug in recording deletion coordinates is fixed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1014 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 21:13:28 +00:00
ebanks a6477df6d1 Now optionally outputs whether "SNPs" are maintained/cleaned out/introduced by cleaning
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1013 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 20:02:02 +00:00
ebanks 8f4bc8cb6e Move filtering functionality into the PrintReadsWalker. More to come.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1010 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 16:38:08 +00:00
kiran 161c74716c Forgot to change some direct references to variables in SSG. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1009 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 14:16:18 +00:00
kiran 9eeb5f79d4 Various refactoring to achieve hapmap and dbsnp awareness, the ability to set pop-gen and secondary base priors from the command-line, and general code cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1008 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 07:21:08 +00:00
kiran f2946fa3e8 Various refactoring to achieve hapmap and dbsnp awareness, the ability to set pop-gen and secondary base priors from the command-line, and general code cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1007 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 07:20:22 +00:00
ebanks f6af190b74 ignore clipped reads for realigning indel positions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1006 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 01:01:27 +00:00
asivache 811f560efb add refseq annotations to single sample calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1003 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-12 19:43:30 +00:00
asivache ca09a10b76 refseq annotation rod is now manually bound to tell coding indels from non-coding ones
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1001 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-12 19:27:37 +00:00
hanna 5859948e80 Fixed bugs in CleanedReadInjector arising from integration testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@999 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-12 17:37:33 +00:00
depristo fb7ba47fff Now does really neightbor distance calculation, as well as true snp cluster counting
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@998 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-12 16:29:26 +00:00
jmaguire dbf2cc037c don't have a null-pointer hissy fit when the reference is N.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@997 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-12 13:59:16 +00:00
asivache 4eda040e0f what used to be internal cutoff values are now exposed as cmdline parameters: minCoverage, minNormalCoverage, minFraction, minConsensusFraction
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@995 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 21:22:52 +00:00
kiran 41687d5237 Added accessors for the prior probabilities.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@994 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 21:16:10 +00:00
kiran 12dd18cdba Now aware of Hapmap and dbSNP sites. We *can* change the priors there, but we don't yet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@993 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 21:15:34 +00:00
asivache d5cd883b99 bug fixed when a read with alignment end exactly at the window boundary and with last cigar element being an indel would cause index-out-of-bounds exception
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@992 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 21:03:15 +00:00
kiran a12009e9e7 Added a new constructor in which priors for hom-ref, het, and hom-var can be specified. Otherwise, it uses the default values of 0.999, 1e-3, and 1e-5 respectively.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@991 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 20:33:45 +00:00
kiran 909fefa40a Argumentized priors for hom-ref, het, and hom-var.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@990 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 20:32:44 +00:00
hanna 71e3825fa1 First pass of a walker for Eric that searches through an input BAM file for unclean reads, injecting the cleaned reads in their place and outputting the composite result.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@989 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 20:18:13 +00:00
ebanks ffffe3b2f6 -Support for 1KG SNP calls in RODs
-Minor bug fix


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@987 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 18:56:37 +00:00
ebanks 599ceeddd8 Better method for downsampling deep regions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@983 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 16:57:40 +00:00
ebanks 4d9a88153a Update inferred insert size of cleaned reads when they are paired
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@982 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 16:29:13 +00:00
ebanks 3796654069 Added walker to emit intervals of clustered SNP calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@981 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 00:57:14 +00:00
aaron 94b0e46d12 checked in a sample xml file used to store the defaults for the SomaticCoverage tool, and added it to the SomaticCoverage.jar in build.sml. Also added a inputStream marshalling method to the GATKArgumentCollection.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@979 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 20:46:16 +00:00
asivache 8d25f1a105 should be a little faster
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@978 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 20:33:45 +00:00
aaron 026f68fb41 a couple of quick name changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@976 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 20:02:52 +00:00
ebanks b1f90635c1 1. downsample when there are too many mismatching reads (needs perfecting)
2. allow user to specify that no reads be emitted


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@974 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 19:55:42 +00:00
asivache 39dcd4f11f an attempt to bail out when unmapped reads are reached at the end of the file(s). still testing...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@973 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 19:53:50 +00:00
asivache 030efc468f added naive ad-hoc cutoff for the pile size the cleaner will attempt to process; use --maxPileSize argument to force any pile larger than specified cutoff to be directly written to the output without cleaning
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@972 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:52:35 +00:00
ebanks f9be175f44 Be smart about trying alternate consenses:
try prior indels first and only 1 instance of them


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@971 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:43:22 +00:00
aaron f304803811 initial check-in of an easy way to create command line tools based on the GATK
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@970 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:34:02 +00:00
depristo 9ebcd6546d Convenience printing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@968 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:07:38 +00:00
asivache 06e5a765f8 now has two modes: one sample - just call indel sites; two samples - call somatic-looking variants only. Still uses heuristic count-based cutoffs, cutoffs are hardcoded and are pretty conservative...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@967 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 16:41:38 +00:00
ebanks 5451bbfd5a -move final vars to command-line args
-Per Andrey: ignore indels from aligner when testing against alt consensus


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@966 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 16:39:00 +00:00
kiran 6bb7f7e9d8 Commented some stuff out so that things compile.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@963 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 14:06:33 +00:00
kiran 87ba8b3451 Removed some useless code. Don't apply second-base test if the coverage is too high, since the binomial probs explode and return NaN or Infinite values.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@961 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 08:27:06 +00:00
kiran a12ed404ce Changed method name from applyFourBaseDistributionPrior to applySecondBaseDistributionPrior. 'Cause that's how I roll.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@960 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 08:21:22 +00:00
hanna e77dfe9983 Allow script to be easily modified to support different platforms.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@955 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 16:06:57 +00:00
depristo 7fa84ea157 10x speedup of recalibration walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@954 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 15:39:40 +00:00
ebanks b45b1d5f2b border case bug fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@951 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 04:33:15 +00:00
asivache 13eb868536 helper class. array-like random access and fast shift. good for sliding windows (e.g. keeping coverage over last 100 bases while sliding along the reference)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@942 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 00:11:57 +00:00
asivache 3d6e738a60 still under development. does not genotype yet, but walks and talks (counts overal coverage and indel variant occurences at every reference position
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@941 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 00:10:31 +00:00
ebanks 58f7ae8628 better filtering, plus deal with case where user doesn't input maxlength
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@939 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 18:44:29 +00:00
asivache b4ef16ced2 extractIndels() now should deal correctly with soft- and hard-clipped bases
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@936 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 16:04:49 +00:00
hanna e2ed56dc96 Add a MAX_READ_GROUPS sanity parameter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@934 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 13:57:43 +00:00
asivache 9f35a5aa32 Insidious bug: clipped sequences (S cigar elements) where a) processed incorrectly; b) sometimes caused IntervalCleaner to crash, if such sequence occured at the boundary of the interval. The following inconsistency occurs: LocusWindow traversal instantiates interval reference stretch up to rightmost read.getAlignmentEnd(), but this does not include clipped bases; then IntervalCleaner takes all read bases (as a string) and does not check if some of them were clipped. Inside the interval this would cause counting mismatches on clipped bases, at the boundary of the interval the clipped bases would stick outside the passed reference stretch and index-out-of-bound exception would be thrown. THIS IS A PARTIAL, TEMPORARY FIX of the problem: mismatchQualitySum() is fixed, in that it does not count mismatches on clipped bases anymore; however, we do not attempt yet to realign only meaningful, unclipped part of the read; instead all reads that have clipped bases are assigned to the original reference and we do not attempt to realign them at all (we'd need to be careful to preserve the cigar if we wanted to do this)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@933 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 05:20:29 +00:00
ebanks 3a8219a469 use knowledge from other reads to find a consensus
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@932 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 21:22:17 +00:00
hanna 596773e6c6 Cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@931 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 20:25:08 +00:00
depristo 98396732ba Bug fixes for Andrey
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@930 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 18:19:51 +00:00
asivache b48508a226 indelRealignment() signature changed. The only difference about consensus sequences is that they are passed along with alignment cigars that start inside the sequence, while for 'conventional' reads cigar always starts at position 0 on the read. Logically, indelRealignment() should not know what 'consensus' is. Instead, now it receives an additional int parameter, start of the cigar on the 'read' sequence
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@929 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 17:42:19 +00:00
asivache 9eb38c0222 mostly synchronizing with the main branch. Based on anecdotal evidence (too few examples in the data), realignment (shifting indel left across a repeat) works correctly on non-homonucleotide repeats
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@928 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 16:39:16 +00:00
ebanks c6634e3121 cleaned up some code and minor bug fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@927 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 03:14:21 +00:00
asivache 99c105790b Now indelRealignment should be correct... The old version could only condense to the left homo-nucleotide indels. New version should be able to detect and shift left arbitrary repeated sequence (e.g. deletion of ATA after ATAATAATA will be shifted left to the first occurence of ATA on the ref! NOT THOROUGHLY TESTED YET, will test tonight../somaticIndels.pl --dir . --cutoff 100 -filter EXON --mode SOMATIC --condense 5 --format bed > 0883.indel.somatic.exon.100.bed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@926 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 23:54:07 +00:00
hanna 40ac3b7816 Inject read group into covars_out file's toString output. Continue fixing systematic bug in the code where flattenData is not joined to the read group.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@924 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 20:43:28 +00:00
asivache 0bb4565798 added AlignmentUtils.getNumAlignmentBlocks(read) - a faster alternative to read.getAlignmentBlocks().size(); IntervalCleaner updated accordingly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@923 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 19:35:21 +00:00
asivache 92b054b71b moved another variant of numMismatches to AlignmentUtils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@922 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 18:07:48 +00:00
asivache 7018dd1469 moved another variant of numMismatches to AlignmentUtils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@921 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 18:05:29 +00:00
hanna ac5b7dd453 Fixed order-of-operations bug.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@919 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 03:22:56 +00:00
depristo 819862e04e major restructuring of generalized variant analysis framework. Now trivally easy to add additional analyses. Easy partitioning of all analyses by features, such as singleton status. Now has transition/transversional bias, counting, dbSNP coverage, HWE violation, selecting of variants by presence/absense in dbs. Also restructured the ROD system to make it easier to add tracks. Also, added the interval track -- if you provide an interval list, then the system autoatmically makese this available to you as a bound rod -- you can always find out where you are in the interval at every site. Python scripts improved to handle more merging, etc, into population snps.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@918 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 23:34:37 +00:00
asivache 400399f1b8 fixed (?) a bug in insertion realignment
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@917 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 22:04:37 +00:00
hanna 34bb43a6c8 Saw that one of the offsets needed to be changed from - 1 to -2 and changed the wrong damn offset. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@915 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 19:18:34 +00:00
ebanks 4623a34ad3 Fix bug in realigning insertion cigar strings
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@914 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 18:46:41 +00:00
ebanks 092a754071 Make sure indel position from SW alignment is leftmost possible
(and improve printouts)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@912 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 15:36:10 +00:00
ebanks 36fb6ca3c5 Allow user to specify the compression to be used when writing out BAM files.
Updated most of the walkers to reflect this change.
Now it won't take forever to write BAMs!



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@909 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 08:48:34 +00:00
ebanks c1792de44f First pass at fixing the incorrect border-case behavior of the cleaner
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@908 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 07:55:06 +00:00
hanna 9da04fd9ac Cleaned up error warning in case no PL groups are present.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@907 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 03:14:17 +00:00
hanna fdfc3abf80 Better handling for case where PL attribute is missing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@905 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 02:52:30 +00:00
hanna 9689bb3331 Very early draft of script integrating the covariant counting / logistic regression. Deleted some unused code and spurious debug info.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@902 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 22:52:11 +00:00
ebanks 4d880477d6 Deal with ends of contigs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@900 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 20:09:53 +00:00
hanna 40bc4ae39a The building blocks for segmenting covariate counting data by read group.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@899 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 19:55:24 +00:00
depristo b492192838 Pairwise SNP distance metrics now enabled
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@892 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 00:11:29 +00:00
hanna 8672ae6019 Now seeing results from the training data. There are still some critical problems in the quality of the output, but we're at least getting training output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@891 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-03 20:41:07 +00:00
ebanks 4e41646c88 print out stats for Andrey
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@890 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-03 17:45:35 +00:00
andrewk dfe464cd81 Updated CovariateCounterWalker to be read group aware
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@889 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-03 10:06:06 +00:00
aaron 107b5d73b5 The flagStatReadWalker generates the exact same statistical output as the samtools flagstat command, so the two outputs can be diff'ed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@883 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 21:23:56 +00:00
kcibul a1218ef508 changed default value for failure output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@880 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 19:32:29 +00:00
depristo 7e7c83ddca fixing insidious bugs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@879 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 18:33:45 +00:00
kcibul ad5b057140 parameterized a bit more
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@877 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 17:58:26 +00:00
andrewk 587d07da00 Merged functionality of two python scripts into LogRegression.py, some clarity updates to covariate and regression java files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@876 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 16:55:05 +00:00
kcibul c4cb867d74 basic clustering of reads to reduce artifacts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@873 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 02:54:21 +00:00
jmaguire 417f5b145e Strand test and misc touch-ups
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@871 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 17:13:21 +00:00