... | ... | @@ -23,7 +23,7 @@ Now we can run megahit as follows (for the Illumina samples G or H): |
|
|
$ megahit -m 0.2 -1 sample.1.fa -2 sample.2.fa -o $sample -t 8 --min-contig-len 500
|
|
|
```
|
|
|
|
|
|
The assembly of independent samples should take \~ 1.5hr. Once you have an assembly, you can use the stats.sh program from bbmap (/bioinf/software/bbmap) on the contigs file to get some basic information about it.
|
|
|
The assembly of independent samples should take <span dir="">\~</span> 1.5hr. Once you have an assembly, you can use the stats.sh program from bbmap (/bioinf/software/bbmap) on the contigs file to get some basic information about it.
|
|
|
|
|
|
Have a look at the stats, what are the N50 and L50 values? How many contigs do you have? What’s the total length of the assembly? Put your results in the table below and we can compare.
|
|
|
| sample | # of contigs | N50 bp | L50 | Assembly length bp | Longest contig bp |
|
... | ... | @@ -31,7 +31,7 @@ Have a look at the stats, what are the N50 and L50 values? How many contigs do y |
|
|
| G | | | | | |
|
|
|
| H | | | | | |
|
|
|
|
|
|
Now, we will do a similar analysis but using the PacBio reads U and Q. Since we are dealing with long reads, we are going to use the [Flye](https://github.com/fenderglass/Flye) assembler. Each library should take \~25 minutes to run.
|
|
|
Now, we will do a similar analysis but using the PacBio reads U and Q. Since we are dealing with long reads, we are going to use the [Flye](https://github.com/fenderglass/Flye) assembler. Each library should take <span dir="">\~</span>25 minutes to run.
|
|
|
|
|
|
```plaintext
|
|
|
flye -t 8 --meta --pacbio-hifi sample.1.fa -o sample.pacbio
|
... | ... | @@ -57,7 +57,7 @@ Next you can run MaxBin for samples **G** or **H**. Use |
|
|
$ marmic_NGS2021/software/MaxBin-2.2.4/run_MaxBin.pl -min_contig_length 1500 -thread 8
|
|
|
```
|
|
|
|
|
|
... and the reads we used for the assembly. Don’t use more than 8 threads. Using these settings the binning step should take \~20 m.
|
|
|
... and the reads we used for the assembly. Don’t use more than 8 threads. Using these settings the binning step should take <span dir="">\~</span>20 m.
|
|
|
|
|
|
### Binning using PacBio assemblies
|
|
|
|
... | ... | @@ -75,7 +75,7 @@ run_MaxBin.pl -contig Q.contigs.fa -thread 10 -min_contig_length 1500 -abund dep |
|
|
|
|
|
What did we get as an output from MaxBin? Try to run stats.sh on your bins like we did before for the assembly and see what you think of the output. Think about what we know and what we don’t know about these bins; we’ll talk more tomorrow about how we can check that we’ve recovered the genomes that were in the original dataset and how we can further investigate them.
|
|
|
|
|
|
**Additional activity for more advanced students**
|
|
|
**Additional activity for advanced students**
|
|
|
|
|
|
## Anvi’o
|
|
|
|
... | ... | |