Cut over to 1kG version of fasta / reference. Updated doc with latest version of tool summary.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@940 348d0f76-0448-11de-a6fe-93d51630548a
This commit is contained in:
parent
58f7ae8628
commit
127c321d0a
|
|
@ -1,17 +1,20 @@
|
||||||
Read Quality Recalibrator
|
Read Quality Recalibrator
|
||||||
-------------------------
|
-------------------------
|
||||||
The tools in this package recalibrate quality scores
|
The tools in this package recalibrate quality scores of
|
||||||
assigned to nucleic acids in an aligned BAM file by
|
Illumina reads in an aligned BAM file. After recalibration,
|
||||||
analyzing the covariation between machine reported
|
the quality scores in the QUAL field in each Illumina read
|
||||||
quality scores and:
|
in the output BAM are accurate in that the reported quality
|
||||||
|
score is equal to its actual probability of mismatching.
|
||||||
|
This is process is accomplished by analyzing the covariation
|
||||||
|
between machine reported quality scores and
|
||||||
|
|
||||||
1) the position within the read, and
|
1) the position within the read, and
|
||||||
2) the preceding nucleotide (sequencing chemistry effect).
|
2) the preceding nucleotide (sequencing chemistry effect).
|
||||||
|
|
||||||
The aligned reads have their dbSNP sites masked out, and
|
The aligned reads have their dbSNP sites masked out, and the
|
||||||
the mismatched bases are used as a metric for the true error
|
mismatched bases are used as a metric for the true error rate
|
||||||
rate of the system. The error rate at different dinucleotides
|
of the system. The error rate at different dinucleotides and
|
||||||
and positions in the read is then fed into a logistic regression
|
positions in the read is then fed into a logistic regression
|
||||||
system which outputs a correction factor for each of those
|
system which outputs a correction factor for each of those
|
||||||
combinations which are then use to output a recalibrated BAM
|
combinations which are then use to output a recalibrated BAM
|
||||||
file.
|
file.
|
||||||
|
|
@ -79,7 +82,7 @@ directory at any time.
|
||||||
Known Issues
|
Known Issues
|
||||||
------------
|
------------
|
||||||
- The recalibrator places severe memory demands on
|
- The recalibrator places severe memory demands on
|
||||||
files with large numbers of read groups (> 1000).
|
files with large numbers of read groups.
|
||||||
- If running in 'evaluation' mode (see the 'Running'
|
- If running in 'evaluation' mode (see the 'Running'
|
||||||
section above), X11 is required to generate the
|
section above), X11 is required to generate the
|
||||||
graphs. If running on a machine via ssh, be certain
|
graphs. If running on a machine via ssh, be certain
|
||||||
|
|
|
||||||
|
|
@ -17,13 +17,13 @@ output_root = './'
|
||||||
resources='resources/'
|
resources='resources/'
|
||||||
|
|
||||||
# Where does the reference live?
|
# Where does the reference live?
|
||||||
reference_base = resources + 'Homo_sapiens_assembly18'
|
reference_base = resources + 'human_b36_both'
|
||||||
reference = reference_base + '.fasta'
|
reference = reference_base + '.fasta'
|
||||||
reference_dict = reference_base + '.dict'
|
reference_dict = reference_base + '.dict'
|
||||||
reference_fai = reference_base + '.fasta.fai'
|
reference_fai = reference_base + '.fasta.fai'
|
||||||
|
|
||||||
# Where does DBSNP live?
|
# Where does DBSNP live?
|
||||||
dbsnp = resources + 'dbsnp.rod.out'
|
dbsnp = resources + 'dbsnp.1kg.rod.out'
|
||||||
|
|
||||||
# Where are the application files required to run the recalibration?
|
# Where are the application files required to run the recalibration?
|
||||||
gatk = resources + 'gatk/GenomeAnalysisTK.jar'
|
gatk = resources + 'gatk/GenomeAnalysisTK.jar'
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue