Cut over to 1kG version of fasta / reference. Updated doc with latest version of tool summary.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@940 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 21:11:44 +00:00 · 2009-06-08 21:11:44 +00:00 · 127c321d0a
parent 58f7ae8628
commit 127c321d0a
2 changed files with 14 additions and 11 deletions
--- a/doc/ReadQualityRecalibrator/README
+++ b/doc/ReadQualityRecalibrator/README
@ -1,17 +1,20 @@
 Read Quality Recalibrator
 -------------------------
-The tools in this package recalibrate quality scores 
+The tools in this package recalibrate quality scores of 
-assigned to nucleic acids in an aligned BAM file by 
+Illumina reads in an aligned BAM file. After recalibration, 
-analyzing the covariation between machine reported 
+the quality scores in the QUAL field in each Illumina read 
-quality scores and: 
+in the output BAM are accurate in that the reported quality 
 score is equal to its actual probability of mismatching.  
 This is process is accomplished by analyzing the covariation 
 between machine reported quality scores and 
 1) the position within the read, and 
 2) the preceding nucleotide (sequencing chemistry effect).  
-The aligned reads have their dbSNP sites masked out, and 
+The aligned reads have their dbSNP sites masked out, and the 
-the mismatched bases are used as a metric for the true error 
+mismatched bases are used as a metric for the true error rate 
-rate of the system.  The error rate at different dinucleotides 
+of the system.  The error rate at different dinucleotides and 
-and positions in the read is then fed into a logistic regression 
+positions in the read is then fed into a logistic regression 
 system which outputs a correction factor for each of those 
 combinations which are then use to output a recalibrated BAM 
 file. 
@ -79,7 +82,7 @@ directory at any time.
 Known Issues
 ------------
 - The recalibrator places severe memory demands on
-  files with large numbers of read groups (> 1000).
+  files with large numbers of read groups.
 - If running in 'evaluation' mode (see the 'Running'
  section above), X11 is required to generate the 
  graphs.  If running on a machine via ssh, be certain
--- a/python/RecalQual.py
+++ b/python/RecalQual.py
@ -17,13 +17,13 @@ output_root = './'
 resources='resources/'
 # Where does the reference live?
-reference_base = resources + 'Homo_sapiens_assembly18'
+reference_base = resources + 'human_b36_both'
 reference      = reference_base + '.fasta'
 reference_dict = reference_base + '.dict'
 reference_fai  = reference_base + '.fasta.fai'
 # Where does DBSNP live?
-dbsnp = resources + 'dbsnp.rod.out'
+dbsnp = resources + 'dbsnp.1kg.rod.out'
 # Where are the application files required to run the recalibration?
 gatk = resources + 'gatk/GenomeAnalysisTK.jar'