diff --git a/cookbook.md b/cookbook.md index 3e8fca1..e5acf6b 100644 --- a/cookbook.md +++ b/cookbook.md @@ -5,6 +5,9 @@ * [Mapping long reads](#map-pb) * [Mapping Illumina paired-end reads](#map-sr) * [Evaluating mapping accuracy with simulated reads (for developers)](#mapeval) +- [Full-Genome Alignment](#genome-aln) + * [Intra-species assembly alignment](#asm-to-ref) + * [Calling variants from assembly-to-reference alignment](#asm-var) - [Read Overlap](#read-overlap) * [Long-read overlap](#long-read-overlap) * [Evaluating overlap sensitivity (for developers)](#ov-eval) @@ -39,7 +42,7 @@ line. **Please always make sure you are using an intended pre-built index.** ### Mapping Illumina paired-end reads: ```sh -minimap2 -ax sr ecoli_ref.fa ecoli_mason_1.fq ecoli_mason_2.fq > mapped-sr.sam +minimap2 -ax sr -t4 ecoli_ref.fa ecoli_mason_1.fq ecoli_mason_2.fq > mapped-sr.sam ``` ### Evaluating mapping accuracy with simulated reads (for developers) @@ -78,6 +81,50 @@ paftools.js mason2fq tmp.sam | seqtk seq -1 > ecoli_mason_2.fq +## Full-Genome Alignment + +### Intra-species assembly alignment +```sh +minimap2 -cx asm5 --cs ecoli_ref.fa ecoli_canu.fa > ecoli_canu.paf +``` +Here `ecoli_canu.fa` is the Canu assembly of `ecoli_p6_25x_canu.fa`. This +command line outputs alignments in the [PAF format][paf]. Use `-a` instead of +`-c` to get output in the SAM format. + +### Calling variants from assembly-to-reference alignment +```sh +# don't forget the "--cs" option; otherwise it doesn't work +minimap2 -cx asm5 --cs ecoli_ref.fa ecoli_canu.fa \ + | sort -k6,6 -k8,8n \ + | paftools.js call - > out.txt +``` +The first few lines in `out.txt` are +``` +R NC_000913 0 257899 +V NC_000913 668 669 1 60 g - tig00000001 3439623 3439623 + +V NC_000913 896 897 1 60 g - tig00000001 3439850 3439850 + +V NC_000913 1304 1305 1 60 g - tig00000001 3440257 3440257 + +``` +where an `R`-line gives the regions coverged by one contig; a `V`-line consists of: + +1. Reference sequence name +2. Reference start +3. Reference end (a 0-based, half-close-half-open interval) +4. Contig coverage (please ignore a `V`-line if this number is not one!) +5. Mean mapping quality of contigs covering the site +6. Reference base ("-" for an insertion) +7. Contig base ("-" for a deletion) +8. Contig name +9. Contig start +10. Contig end +11. Contig strand + +`paftools.js call` does not support VCF output. This feature is planned but has +not been implemented yet. + + + + ## Read Overlap ### Long read overlap @@ -86,6 +133,8 @@ paftools.js mason2fq tmp.sam | seqtk seq -1 > ecoli_mason_2.fq minimap2 -x ava-pb ecoli_p6_25x_canu.fa ecoli_p6_25x_canu.fa > overlap.paf # For Nanopore reads (ava-ont also works with PacBio but not as good): minimap2 -x ava-ont -r 10000 ecoli_p6_25x_canu.fa ecoli_p6_25x_canu.fa > overlap.paf +# If you have miniasm installed: +miniasm -f ecoli_p6_25x_canu.fa overlap.paf > asm.gfa ``` Here we explicitly applied `-r 10000`. We are considering to set this as the default for the `ava-ont` mode as this seems to improve the contiguity for @@ -120,3 +169,4 @@ minimap2 -ax splice -k14 -uf ref.fa reads.fa > aln.sam [pbsim]: https://github.com/pfaucon/PBSIM-PacBio-Simulator [mason2]: https://github.com/seqan/seqan/tree/master/apps/mason2 +[paf]: https://github.com/lh3/miniasm/blob/master/PAF.md