explain secondary/supplementary aln for RNA-seq

This commit is contained in:
Heng Li 2017-11-30 23:02:20 -05:00
parent 984f7846c0
commit 07d41efc2b
1 changed files with 14 additions and 0 deletions

View File

@ -154,6 +154,20 @@ slight improvement to accuracy. For noisy Nanopore Direct RNA-seq reads, it is
recommended to use a smaller k-mer size for increased sensitivity to the first
or the last exons.
Minimap2 rates an alignment by the score of the max-scoring sub-segment,
*excluding* introns, and marks the best alignment as primary in SAM. When a
spliced gene also has unspliced pseudogenes, minimap2 does not intentionally
prefer spliced alignment, though in practice it more often marks the spliced
alignment as the primary. By default, minimap2 outputs up to five secondary
alignments (i.e. likely pseudogenes in the context of RNA-seq mapping). This
can be tuned with option **-N**.
For long RNA-seq reads, minimap2 may produce chimeric alignments potentially
caused by gene fusions/structural variations or by an intron longer than the
max intron length **-G** (200k by default). For now, it is not recommended to
apply an excessively large **-G** as this slows down minimap2 and sometimes
leads to false alignments.
It is worth noting that by default `-x splice` prefers GT[A/G]..[C/T]AG
over GT[C/T]..[A/G]AG, and then over other splicing signals. Considering
one additional base improves the junction accuracy for noisy reads, but