... | ... | @@ -12,9 +12,9 @@ A typical line we would use to run `checkm` looks like this one: |
|
|
$ checkm lineage_wf -f checkm_MaxBin.txt --tab_table -x fasta -t 10 --pplacer_threads 10 <folder with bins> <output location>
|
|
|
```
|
|
|
|
|
|
**Like we mentioned above, we have previously selected a group of 20 bins using the assemblies of the metagenomes we are analyzing**. Find the bins in today's folder. From this point forward, we will be referring to this last group of bins.
|
|
|
**Like we mentioned above, we have previously run checkM for all bins .** Find the bins and result of checkM in today's folder. From this point forward, we will be referring to this last group of bins.
|
|
|
|
|
|
See the output file generated by checkM (e.g., checkm_MaxBin-selected-20.txt). It should look something like this:
|
|
|
See the output file generated by checkM (<span dir="">out_checkM_marmic2021-allbins.tab</span>). It should look something like this:
|
|
|
|
|
|
```plaintext
|
|
|
Bin Id Marker lineage # genomes # markers # marker sets 0 1 2 3 4 5+ Completeness Contamination Strain heterogeneity
|
... | ... | @@ -27,7 +27,9 @@ Q.maxbin.008 k__Bacteria (UID203) 5449 104 58 36 32 31 4 1 0 68.12 39.26 4.08 |
|
|
Q.maxbin.006 k__Bacteria (UID203) 5449 104 58 22 27 55 0 0 0 65.52 27.43 0.00
|
|
|
```
|
|
|
|
|
|
What do you think about the quality of these bins? In the following activities we will be analyzing some of them using anvi'o. For now, let's analyze them a little further. We can ask checkM to give us a bit more taxonomical information using the `checkm tree` function incorporated in checkM.
|
|
|
Use `grep` to select the bins from libraries G and Q.
|
|
|
|
|
|
What do you think about the quality of these bins? In the following activities we will be analyzing some of them using anvi'o. For now, let's analyze them a little further. We can ask checkM to give us a bit more taxonomical information using the `checkm tree` function incorporated in checkM (**you don't have to run it, we have generated the file already for you**). Nonetheless, this is the command you'd use:
|
|
|
|
|
|
```plaintext
|
|
|
$ checkm tree_qa out-checkm-allbins/ -o 2 -f detailed_checkM-allbins --tab_table
|
... | ... | @@ -40,15 +42,7 @@ Bin Id # unique markers (of 43) # multi-copy Insertion branch UID Taxonomy (cont |
|
|
U.maxbin.012 41 0 UID3398 k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae g__;s__ 51.6079429195 3.427858 3796 0.883138974835 11 9 58.9159876191 2.34719045897 4.32964955556 0.425642799849 4296.55555556 377.714504505
|
|
|
```
|
|
|
|
|
|
Keep in mind that you could also evaluate the completeness and contamination of MAGs by finding ‘essential’ protein sequences. Take a look at the ‘HMM.essential.rb’ script found in the course folder. This script is part of a larger collection of tools available at <https://github.com/lmrodriguezr/enveomics>. If you are interested in doing other meta(genomic) analyses, you will find many other useful scripts in this repository.
|
|
|
|
|
|
In order to run this script, we first need to have protein sequences of the respective reference genomes and the bins. We will use the gene prediction software Prodigal for this. First have a look at the help menu:
|
|
|
|
|
|
```plaintext
|
|
|
$ /bioinf/software/Prodigal/Prodigal-2.6.2/prodigal -h
|
|
|
```
|
|
|
|
|
|
Can you figure out how to run it? Once you have your protein translations you are ready. Use the HMM.essential.rb script and compare the completeness/contamination to the values you previously obtained using checkM. Why do you think these values are not the same?
|
|
|
Keep in mind that you could also evaluate the completeness and contamination of MAGs by finding ‘essential’ protein sequences. Take a look at the ‘HMM.essential.rb’ script found in acollection of tools available at <https://github.com/lmrodriguezr/enveomics>. If you are interested in doing other meta(genomic) analyses, you will find many other useful scripts in this repository.
|
|
|
|
|
|
## GTDB-tk
|
|
|
|
... | ... | |