Cut over to 1kG version of fasta / reference. Updated doc with latest version of tool summary.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@940 348d0f76-0448-11de-a6fe-93d51630548a
This commit is contained in:
parent
58f7ae8628
commit
127c321d0a
|
|
@ -1,17 +1,20 @@
|
|||
Read Quality Recalibrator
|
||||
-------------------------
|
||||
The tools in this package recalibrate quality scores
|
||||
assigned to nucleic acids in an aligned BAM file by
|
||||
analyzing the covariation between machine reported
|
||||
quality scores and:
|
||||
The tools in this package recalibrate quality scores of
|
||||
Illumina reads in an aligned BAM file. After recalibration,
|
||||
the quality scores in the QUAL field in each Illumina read
|
||||
in the output BAM are accurate in that the reported quality
|
||||
score is equal to its actual probability of mismatching.
|
||||
This is process is accomplished by analyzing the covariation
|
||||
between machine reported quality scores and
|
||||
|
||||
1) the position within the read, and
|
||||
2) the preceding nucleotide (sequencing chemistry effect).
|
||||
|
||||
The aligned reads have their dbSNP sites masked out, and
|
||||
the mismatched bases are used as a metric for the true error
|
||||
rate of the system. The error rate at different dinucleotides
|
||||
and positions in the read is then fed into a logistic regression
|
||||
The aligned reads have their dbSNP sites masked out, and the
|
||||
mismatched bases are used as a metric for the true error rate
|
||||
of the system. The error rate at different dinucleotides and
|
||||
positions in the read is then fed into a logistic regression
|
||||
system which outputs a correction factor for each of those
|
||||
combinations which are then use to output a recalibrated BAM
|
||||
file.
|
||||
|
|
@ -79,7 +82,7 @@ directory at any time.
|
|||
Known Issues
|
||||
------------
|
||||
- The recalibrator places severe memory demands on
|
||||
files with large numbers of read groups (> 1000).
|
||||
files with large numbers of read groups.
|
||||
- If running in 'evaluation' mode (see the 'Running'
|
||||
section above), X11 is required to generate the
|
||||
graphs. If running on a machine via ssh, be certain
|
||||
|
|
|
|||
|
|
@ -17,13 +17,13 @@ output_root = './'
|
|||
resources='resources/'
|
||||
|
||||
# Where does the reference live?
|
||||
reference_base = resources + 'Homo_sapiens_assembly18'
|
||||
reference_base = resources + 'human_b36_both'
|
||||
reference = reference_base + '.fasta'
|
||||
reference_dict = reference_base + '.dict'
|
||||
reference_fai = reference_base + '.fasta.fai'
|
||||
|
||||
# Where does DBSNP live?
|
||||
dbsnp = resources + 'dbsnp.rod.out'
|
||||
dbsnp = resources + 'dbsnp.1kg.rod.out'
|
||||
|
||||
# Where are the application files required to run the recalibration?
|
||||
gatk = resources + 'gatk/GenomeAnalysisTK.jar'
|
||||
|
|
|
|||
Loading…
Reference in New Issue