Changes

Coto · f7f5a3e4
--- a/Practical-5-Assembly-and-binning.md
+++ b/Practical-5-Assembly-and-binning.md
@@ -9,13 +9,14 @@ I added a new script in `/bioinf/transfer/marmic_NGS2022/software` called `FastA
 **Using the script above, determine the average read length for Illumina libraries (G and H) and PacBio libraries (U and Q).**

 **What can you say about the length distribution between the samples? what's the average unassembled read length?**
-| Library | `#` reads | Avg. read length | Number of total bases |
-|---------|-----------|------------------|-----------------------|
+| Library | `#` reads | Total number of bases | Avg. read length |
+|---------|-----------|-----------------------|------------------|
 | G |  |  |  |
 | H |  |  |  |
 | U |  |  |  |
 | Q |  |  |  |

+
 For the assembly of the metagenomic samples we are going to use megahit. Be sure to also check out other assemblers such as IDBA-ud and SPAdes.

 First, make sure it is available. It should be included in the marmic2022 conda environment.
@@ -27,12 +28,12 @@ $ wget https://github.com/voutcn/megahit/releases/download/v1.2.9/MEGAHIT-1.2.9-
 $ tar zvxf MEGAHIT-1.2.9-Linux-x86_64-static.tar.gz
 ```

-After the installation, explore the options available. First, we are going to assemble each metagenomic sample independently ~~and also as a co-assembly~~. Given that time is limiting during the practical, each student will use only ONE k-mer of the following list: 33,37,47,53,57,63,67, and 73 (that will take less time to run, **why?**). During normal/extended analyses, I'd recommend use a wider range of k-mer sizes, e.g., default values.
+First, we are going to assemble each metagenomic sample independently ~~and also as a co-assembly~~. Given that time is limiting during the practical, each student will use only ONE k-mer of the following list: 33,37,47,53,57,63,67, and 73 (that will take less time to run, **why?**). During normal/extended analyses, I'd recommend use a wider range of k-mer sizes, e.g., default values.

 Now we can run megahit as follows (**for the Illumina sample G**):

 ```plaintext
-$ megahit -m 0.2 -1 G.1.fa -2 G.2.fa -o $sample -t 8 --min-contig-len 500 --k-list #your-assigned-kmer  
+$ megahit -m 0.2 -1 G.1.fa -2 G.2.fa -o G-{your-assigned-kmer} -t 8 --min-contig-len 500 --k-list #your-assigned-kmer  
 ```
 | Student | k-mer | `#` contigs (>500bp) | N50 and L50 | Longest contig (bp) | Total length (bp) |
 |---------|-------|----------------------|-------------|---------------------|-------------------|
@@ -49,12 +50,13 @@ $ megahit -m 0.2 -1 G.1.fa -2 G.2.fa -o $sample -t 8 --min-contig-len 500 --k-li

 The assembly of samples using single k-mer sizes should take <span dir="">\~</span> 10 minutes. Once you have an assembly, you can use the stats.sh program from bbmap (/bioinf/software/bbmap) on the contigs file to get some basic information about it.

-Given the limited time we have during the class, we have previously generated the assemblies for the Illumina and PacBio libraries for you.
+For the rest of the session, we will use the complete assembly (i.e., the result of using all k-mers). Given the limited time we have during the class, we have previously generated the assemblies for the Illumina and PacBio libraries for you.

 The assemblies are located here:

 ```plaintext
-day_3
+$ day_4/01.Illumina/02.full-assembly
+$ day_4/02.PacBio/02.full-assembly
 ```

 For the generation of the PacBio assemblies we used the following command: