Cut over to 1kG version of fasta / reference. Updated doc with latest version of tool summary.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@940 348d0f76-0448-11de-a6fe-93d51630548a
This commit is contained in:
hanna 2009-06-08 21:11:44 +00:00
parent 58f7ae8628
commit 127c321d0a
2 changed files with 14 additions and 11 deletions

View File

@ -1,17 +1,20 @@
Read Quality Recalibrator Read Quality Recalibrator
------------------------- -------------------------
The tools in this package recalibrate quality scores The tools in this package recalibrate quality scores of
assigned to nucleic acids in an aligned BAM file by Illumina reads in an aligned BAM file. After recalibration,
analyzing the covariation between machine reported the quality scores in the QUAL field in each Illumina read
quality scores and: in the output BAM are accurate in that the reported quality
score is equal to its actual probability of mismatching.
This is process is accomplished by analyzing the covariation
between machine reported quality scores and
1) the position within the read, and 1) the position within the read, and
2) the preceding nucleotide (sequencing chemistry effect). 2) the preceding nucleotide (sequencing chemistry effect).
The aligned reads have their dbSNP sites masked out, and The aligned reads have their dbSNP sites masked out, and the
the mismatched bases are used as a metric for the true error mismatched bases are used as a metric for the true error rate
rate of the system. The error rate at different dinucleotides of the system. The error rate at different dinucleotides and
and positions in the read is then fed into a logistic regression positions in the read is then fed into a logistic regression
system which outputs a correction factor for each of those system which outputs a correction factor for each of those
combinations which are then use to output a recalibrated BAM combinations which are then use to output a recalibrated BAM
file. file.
@ -79,7 +82,7 @@ directory at any time.
Known Issues Known Issues
------------ ------------
- The recalibrator places severe memory demands on - The recalibrator places severe memory demands on
files with large numbers of read groups (> 1000). files with large numbers of read groups.
- If running in 'evaluation' mode (see the 'Running' - If running in 'evaluation' mode (see the 'Running'
section above), X11 is required to generate the section above), X11 is required to generate the
graphs. If running on a machine via ssh, be certain graphs. If running on a machine via ssh, be certain

View File

@ -17,13 +17,13 @@ output_root = './'
resources='resources/' resources='resources/'
# Where does the reference live? # Where does the reference live?
reference_base = resources + 'Homo_sapiens_assembly18' reference_base = resources + 'human_b36_both'
reference = reference_base + '.fasta' reference = reference_base + '.fasta'
reference_dict = reference_base + '.dict' reference_dict = reference_base + '.dict'
reference_fai = reference_base + '.fasta.fai' reference_fai = reference_base + '.fasta.fai'
# Where does DBSNP live? # Where does DBSNP live?
dbsnp = resources + 'dbsnp.rod.out' dbsnp = resources + 'dbsnp.1kg.rod.out'
# Where are the application files required to run the recalibration? # Where are the application files required to run the recalibration?
gatk = resources + 'gatk/GenomeAnalysisTK.jar' gatk = resources + 'gatk/GenomeAnalysisTK.jar'