ebanks
9f1d3aed26
-Output single filtration stats file with input from all filters
...
-move out isHet test to GenotypeUtils so all can use it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1369 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 20:44:21 +00:00
depristo
d88ea91939
Slight reorganization of genotype interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1368 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 19:19:11 +00:00
depristo
880a01cb5d
Slight reorganization of genotype interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1367 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 19:18:41 +00:00
depristo
d840a47b11
Slight reorganization of genotype interface
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1366 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 19:17:15 +00:00
depristo
20986a03de
cleanup before moving files
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1365 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 19:08:24 +00:00
ebanks
e3b08f245f
Pull out RMS calculation into MathUtils for all to use
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1364 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 17:00:20 +00:00
ebanks
e495b836d3
- added mapping quality filter
...
- make the filters brainless in that they strictly have thresholds and filter based on them; require user to calculate and input these thresholds.
- update filters in preparation for migration to new output format
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1363 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 16:46:51 +00:00
ebanks
ba07f057ac
finish the math for RMS
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1362 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 16:18:09 +00:00
kiran
8bc925a216
Commit on the behalf of Mark: cleaning up some old and busted code in GenotypeLikelihood and associated objects.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1361 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 21:18:30 +00:00
asivache
8d06bb21ed
A little gadget to select random samples from input stream(s) of unknown length. By default, selects a single line (with probability 1/TOTAL_NUMBER_OF_LINES_READ), with -N option randomly selects specified number of lines. Can read from STDIN or from arbitrary number of input streams (all streams will be merged). Examples:\n cat file1 file2 file3 | randomSampleFromStream.pl -N 5 \n\n or \n randomSampleFromStream.pl file1 file2 file3
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1360 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 18:55:14 +00:00
aaron
9dfee7a75c
the "-genotype" option now acts correctly as a discovery mode caller in SSG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1359 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 18:31:45 +00:00
aaron
c2c80dd946
cleanup and moving some things around to more logical locations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1358 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 16:28:39 +00:00
sjia
9dada95ec3
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1357 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 16:21:16 +00:00
aaron
9a0761cd8f
accidentally committed some debug code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1356 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 15:25:22 +00:00
aaron
2f2c8576a5
GLF output is now well validated, and some changes for new Genotypes interface code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1355 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 15:21:28 +00:00
andrewk
afccbc44ec
Script that performs all the processing steps from raw Illumina reads through to analysis of barcoding and hybrid selection efficience as documented in the GATK tutorial; can automatically run all steps in series on the farm.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1354 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 00:22:53 +00:00
andrewk
eb4b9a743a
Script that runs most of the steps involved in validating the CoverageEval system that predicts performance for given depth of sequencing coverage across a genome.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1353 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 00:18:45 +00:00
andrewk
8eeb87af2a
Tests for downsampling related utilities in ListUtils class that didn't get checked in earlier
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1352 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 00:09:35 +00:00
andrewk
efd0fd1f0a
Short python script that takes paired-end BAMs and aligns them with BWA. Referenced in GSA wiki tutorial
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1351 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 00:04:10 +00:00
andrewk
678c2533ca
Removed custom output stream for file and replaced with the standard out PrintStream
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1350 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 22:36:42 +00:00
aaron
2a7dfce9ae
fix the header string mismatch that Andrew found
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1349 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 22:26:34 +00:00
andrewk
44673b2dce
Removed a debugging println that was accidentally checked in
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1348 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 22:23:27 +00:00
andrewk
845488ff94
VariantEval now decides whether a variant is not confidently called using BestVsNetxBest if genotypes are being evaluated and BestVsRef if not (variant discovery only). Also, the absolute value of the BestVsRef LOD (getVariantionConfidence) is used so that confident reference calls (if the GELI has output them) will show up in the final table as reference calls rather than no calls.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1347 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 21:54:06 +00:00
andrewk
1c648a2d5f
Skip compiled python files (*.pyc) in svn status output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1346 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 21:45:23 +00:00
andrewk
fdc7cc555b
Removed extra column name from geliHeaderString that was mislabeling the 10 genotype likelihoods by shifting them over by onex
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1345 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 21:42:02 +00:00
hanna
f3e63f00bc
Exclude secondary base caller code from playground jar. Still TODO: figure
...
out how do deal with the playground jar.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1344 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 21:02:46 +00:00
aaron
0087234ed7
small code cleanup, a couple of little changes to SSGGenotypeCall
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1343 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 19:47:37 +00:00
ebanks
fbc7d44bc7
don't allow users to input priors anymore; they should be using heterozygosity and having the SSG calculate priors.
...
Note that nothing was changed for dnSNP/hapmap priors (not sure what we want to do with these yet - any thoughts?)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1342 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 19:10:33 +00:00
ebanks
b282635b05
Complete reworking of Fisher's exact test for strand bias:
...
- fixed math bug (pValue needs to be initialized to pCutoff, not 0)
- perform factorial calculations in log space so that huge numbers don't explode
- cache factorial calculations so that each value needs to be computed just once for any given instance of the filter
I've tested it against R and it has held up so far...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1341 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 18:52:13 +00:00
aaron
4033c718d2
moving some code around for better organizations, some fixes to the fields out of SSG
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1340 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 15:09:43 +00:00
ebanks
4366ce16e0
Made sure all RODs have a (good) toString() method - and use it in the Venn walker. (thanks, Mark)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1339 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 14:53:27 +00:00
aaron
9cd53d3273
some initial changes from the first review of the genotype redesign, more to come.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1338 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 07:04:05 +00:00
ebanks
feb7238f10
Wasn't always returning the correct alt base
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1337 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 03:08:04 +00:00
hanna
5429b4d4a8
A bit of reorganization to help with more flexible output streams. Pushed construction of data
...
sources and post-construction validation back into the GATKEngine, leaving the MicroScheduler
to just microschedule.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1336 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 23:00:15 +00:00
aaron
bca894ebce
Adding the intial changes for the new Genotyping interface. The bullet points are:
...
- SSG is much simpler now
- GeliText has been added as a GenotypeWriter
- AlleleFrequencyWalker will be deleted when I untangle the AlleleMetric's dependance on it
- GenotypeLikelihoods now implements GenotypeGenerator, but could still use cleanup
There is still a lot more work to do, but this is a good initial check-in.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1335 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 19:43:59 +00:00
kiran
c5c11d5d1c
First attempt at modifying the VFW interfaces to support direct emission of relevant training data per feature and exclusion criterion. This way, you could run the program once, get the training sets, and then feed that training set back to the filters and have them automatically choose the optimal thresholds for themselves. This current version is pretty ugly right now...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1334 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 19:29:03 +00:00
ebanks
3554897222
allow filters to specify whether they want to work with mapping quality zero reads; the VariantFiltrationWalker passes in the appropriate contextual reads
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1333 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 17:38:15 +00:00
hanna
7a13647c35
Support for specifying SAMFileReaders and SAMFileWriters as @Arguments directly. *Very*
...
rough initial implementation, but should provide enough support so that people can stop
creating SAMFileWriters in reduceInit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1332 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 16:11:45 +00:00
depristo
56f769f2ce
Output improvements to GenotypeConcordance calculations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1331 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 12:54:46 +00:00
ebanks
72dda0b85c
Fixed calculations for Mark
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1330 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 03:21:43 +00:00
ebanks
f0378db9b7
added accuracy numbers
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1329 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 01:38:33 +00:00
ebanks
a5a56f1315
At this point, we are convinced that the new priors are the way to go...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1328 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 17:25:25 +00:00
depristo
df4fd498c5
Improvements and bug fixes galore. (1) Now properly handles Q0 bases, filtering them out, you can disable this if you need to (2) support for three-state base probabilities (see email), which is disabled by default (still experimental) but appears to be more emppowered to detect variants (see email too)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1327 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 13:21:38 +00:00
depristo
46643d3724
Improvements and bug fixes galore. (1) Now properly handles Q0 bases, filtering them out, you can disable this if you need to (2) support for three-state base probabilities (see email), which is disabled by default (still experimental) but appears to be more emppowered to detect variants (see email too)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1326 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 13:21:27 +00:00
depristo
d665d9714f
By default now writes output to JOBID.lsf.output instead of going to email -- based on recommendations from the cancer group
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1325 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 13:18:58 +00:00
ebanks
3c4410f104
-add basic indel metrics to variant eval
...
-variants need a length method (can't assume it's a SNP)!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1324 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 03:25:03 +00:00
kcibul
1d6d99ed9c
walk by reference
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1323 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-27 20:21:04 +00:00
ebanks
089ae85be7
1. output grep-able strings for genotype eval
...
2. free DB coverage from isSNP restriction
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1322 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-27 17:36:59 +00:00
kcibul
1bca9409a4
calculate freestanding intervals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1321 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-27 16:40:27 +00:00
asivache
2499c09256
added minIndelCount (short: minCnt) command line argument. The call is made only if the number of reads supporting the consensus indel is equal or greater than the specified value (default: 0, so only minFraction filter is on in default runs!)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1320 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-27 15:22:51 +00:00