Instructions on different long RNA-seq techs

This commit is contained in:
Heng Li 2017-10-29 13:58:25 -04:00
parent 285eb0da05
commit 7f11f4c4d4
1 changed files with 33 additions and 12 deletions

View File

@ -9,15 +9,20 @@
```sh ```sh
git clone https://github.com/lh3/minimap2 git clone https://github.com/lh3/minimap2
cd minimap2 && make cd minimap2 && make
# long reads against a reference genome # long sequences against a reference genome
./minimap2 -a test/MT-human.fa test/MT-orang.fa > test.sam ./minimap2 -a test/MT-human.fa test/MT-orang.fa > test.sam
# create an index first and then map # create an index first and then map
./minimap2 -d MT-human.mmi test/MT-human.fa ./minimap2 -d MT-human.mmi test/MT-human.fa
./minimap2 -a MT-human.mmi test/MT-orang.fa > test.sam ./minimap2 -a MT-human.mmi test/MT-orang.fa > test.sam
# long-read overlap (no test data) # use presets (no test data)
./minimap2 -x ava-pb your-reads.fa your-reads.fa > overlaps.paf ./minimap2 -ax map-pb ref.fa pacbio.fq.gz > aln.sam # PacBio genomic reads
# spliced alignment (no test data) ./minimap2 -ax map-ont ref.fa ont.fq.gz > aln.sam # Oxford Nanopore genomic reads
./minimap2 -ax splice ref.fa rna-seq-reads.fa > spliced.sam ./minimap2 -ax sr ref.fa read1.fa read2.fa > aln.sam # short genomic paired-end reads
./minimap2 -ax splice ref.fa rna-reads.fa > aln.sam # spliced long reads
./minimap2 -ax splice -k14 -uf ref.fa reads.fa > aln.sam # Nanopore Direct RNA-seq
./minimap2 -cx asm5 asm1.fa asm2.fa > aln.paf # intra-species asm-to-asm alignment
./minimap2 -x ava-pb reads.fa reads.fa > overlaps.paf # PacBio read overlap
./minimap2 -x ava-one reads.fa reads.fa > overlaps.paf # Nanopore read overlap
# man page for detailed command line options # man page for detailed command line options
man ./minimap2.1 man ./minimap2.1
``` ```
@ -132,14 +137,30 @@ Nanopore reads.
#### <a name="map-long-splice"></a>Map long mRNA/cDNA reads #### <a name="map-long-splice"></a>Map long mRNA/cDNA reads
```sh ```sh
minimap2 -ax splice ref.fa spliced.fq > aln.sam # strand unknown minimap2 -ax splice -uf ref.fa iso-seq.fq > aln.sam # PacBio Iso-seq/traditional cDNA
minimap2 -ax splice -uf ref.fa spliced.fq > aln.sam # assuming transcript strand minimap2 -ax splice ref.fa nanopore-cdna.fa > aln.sam # Nanopore 2D cDNA-seq
minimap2 -ax splice -uf -k14 ref.fa direct-rna.fq > aln.sam # Nanopore Direct RNA-seq
minimap2 -ax splice --splice-flank=no SIRV.fa SIRV-seq.fa # mapping against SIRV control
``` ```
This command line has been tested on PacBio Iso-Seq reads and Nanopore 2D cDNA There are different long-read RNA-seq technologies, including tranditional
reads, and been shown to work with Nanopore 1D Direct RNA reads by others. Like full-length cDNA, EST, PacBio Iso-seq, Nanopore 2D cDNA-seq and Direct RNA-seq.
typical RNA-seq mappers, minimap2 represents an intron with the `N` CIGAR They produce data of varying quality and properties. By default, `-x splice`
operator. For spliced reads, minimap2 will try to infer the strand relative to assumes the read orientation relative to the transcript strand is unknown. It
transcript and may write the strand to the `ts` SAM/PAF tag. tries two rounds of alignment to infer the orientation and write the strand to
the `ts` SAM/PAF tag if possible. For Iso-seq, Direct RNA-seq and tranditional
full-length cDNAs, it would be desired to apply `-u f` to force minimap2 to
consider the forward transcript strand only. This speeds up alignment with
slight improvement to accuracy. For noisy Nanopore Direct RNA-seq reads, it is
recommended to use a smaller k-mer size for increased sensitivity to the first
or the last exons.
It is worth noting that by default `-x splice` prefers GT[A/G]..[C/T]AG
over GT[C/T]..[A/G]AG, and then over other splicing signals. Considering
one additional base improves the junction accuracy for noisy reads, but
reduces the accuracy when aligning against the widely used SIRV control data.
This is because SIRV does not honor the evolutionarily conservative splicing
signal. If you are studying SIRV, you may apply `--splice-flank=no` to let
minimap2 only model GT..AG, ignoring the additional base.
#### <a name="long-overlap"></a>Find overlaps between long reads #### <a name="long-overlap"></a>Find overlaps between long reads