Cut over to 1kG version of fasta / reference. Updated doc with latest version of tool summary.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@940 348d0f76-0448-11de-a6fe-93d51630548a
This commit is contained in:
hanna 2009-06-08 21:11:44 +00:00
parent 58f7ae8628
commit 127c321d0a
2 changed files with 14 additions and 11 deletions

View File

@ -1,17 +1,20 @@
Read Quality Recalibrator
-------------------------
The tools in this package recalibrate quality scores
assigned to nucleic acids in an aligned BAM file by
analyzing the covariation between machine reported
quality scores and:
The tools in this package recalibrate quality scores of
Illumina reads in an aligned BAM file. After recalibration,
the quality scores in the QUAL field in each Illumina read
in the output BAM are accurate in that the reported quality
score is equal to its actual probability of mismatching.
This is process is accomplished by analyzing the covariation
between machine reported quality scores and
1) the position within the read, and
2) the preceding nucleotide (sequencing chemistry effect).
The aligned reads have their dbSNP sites masked out, and
the mismatched bases are used as a metric for the true error
rate of the system. The error rate at different dinucleotides
and positions in the read is then fed into a logistic regression
The aligned reads have their dbSNP sites masked out, and the
mismatched bases are used as a metric for the true error rate
of the system. The error rate at different dinucleotides and
positions in the read is then fed into a logistic regression
system which outputs a correction factor for each of those
combinations which are then use to output a recalibrated BAM
file.
@ -79,7 +82,7 @@ directory at any time.
Known Issues
------------
- The recalibrator places severe memory demands on
files with large numbers of read groups (> 1000).
files with large numbers of read groups.
- If running in 'evaluation' mode (see the 'Running'
section above), X11 is required to generate the
graphs. If running on a machine via ssh, be certain

View File

@ -17,13 +17,13 @@ output_root = './'
resources='resources/'
# Where does the reference live?
reference_base = resources + 'Homo_sapiens_assembly18'
reference_base = resources + 'human_b36_both'
reference = reference_base + '.fasta'
reference_dict = reference_base + '.dict'
reference_fai = reference_base + '.fasta.fai'
# Where does DBSNP live?
dbsnp = resources + 'dbsnp.rod.out'
dbsnp = resources + 'dbsnp.1kg.rod.out'
# Where are the application files required to run the recalibration?
gatk = resources + 'gatk/GenomeAnalysisTK.jar'