From 127c321d0a7cb942174a876a29fa3f0b744d68da Mon Sep 17 00:00:00 2001
From: hanna <hanna@348d0f76-0448-11de-a6fe-93d51630548a>
Date: Mon, 8 Jun 2009 21:11:44 +0000
Subject: [PATCH] Cut over to 1kG version of fasta / reference.  Updated doc
 with latest version of tool summary.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@940 348d0f76-0448-11de-a6fe-93d51630548a
---
 doc/ReadQualityRecalibrator/README | 21 ++++++++++++---------
 python/RecalQual.py                |  4 ++--
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/doc/ReadQualityRecalibrator/README b/doc/ReadQualityRecalibrator/README
index 97c7f3b12..3a1620e46 100644
--- a/doc/ReadQualityRecalibrator/README
+++ b/doc/ReadQualityRecalibrator/README
@@ -1,17 +1,20 @@
 Read Quality Recalibrator
 -------------------------
-The tools in this package recalibrate quality scores 
-assigned to nucleic acids in an aligned BAM file by 
-analyzing the covariation between machine reported 
-quality scores and: 
+The tools in this package recalibrate quality scores of 
+Illumina reads in an aligned BAM file. After recalibration, 
+the quality scores in the QUAL field in each Illumina read 
+in the output BAM are accurate in that the reported quality 
+score is equal to its actual probability of mismatching.  
+This is process is accomplished by analyzing the covariation 
+between machine reported quality scores and 
 
 1) the position within the read, and 
 2) the preceding nucleotide (sequencing chemistry effect).  
 
-The aligned reads have their dbSNP sites masked out, and 
-the mismatched bases are used as a metric for the true error 
-rate of the system.  The error rate at different dinucleotides 
-and positions in the read is then fed into a logistic regression 
+The aligned reads have their dbSNP sites masked out, and the 
+mismatched bases are used as a metric for the true error rate 
+of the system.  The error rate at different dinucleotides and 
+positions in the read is then fed into a logistic regression 
 system which outputs a correction factor for each of those 
 combinations which are then use to output a recalibrated BAM 
 file. 
@@ -79,7 +82,7 @@ directory at any time.
 Known Issues
 ------------
 - The recalibrator places severe memory demands on
-  files with large numbers of read groups (> 1000).
+  files with large numbers of read groups.
 - If running in 'evaluation' mode (see the 'Running'
   section above), X11 is required to generate the 
   graphs.  If running on a machine via ssh, be certain
diff --git a/python/RecalQual.py b/python/RecalQual.py
index 728bf22ef..eb70d22ab 100755
--- a/python/RecalQual.py
+++ b/python/RecalQual.py
@@ -17,13 +17,13 @@ output_root = './'
 resources='resources/'
 
 # Where does the reference live?
-reference_base = resources + 'Homo_sapiens_assembly18'
+reference_base = resources + 'human_b36_both'
 reference      = reference_base + '.fasta'
 reference_dict = reference_base + '.dict'
 reference_fai  = reference_base + '.fasta.fai'
 
 # Where does DBSNP live?
-dbsnp = resources + 'dbsnp.rod.out'
+dbsnp = resources + 'dbsnp.1kg.rod.out'
 
 # Where are the application files required to run the recalibration?
 gatk = resources + 'gatk/GenomeAnalysisTK.jar'