... | ... | @@ -29,7 +29,7 @@ Q.maxbin.006 k__Bacteria (UID203) 5449 104 58 22 27 55 0 0 0 65.52 27.43 0.00 |
|
|
|
|
|
Use `grep` to select the bins from libraries G and Q.
|
|
|
|
|
|
What do you think about the quality of these bins? In the following activities we will be analyzing some of them using anvi'o. For now, let's analyze them a little further. We can ask checkM to give us a bit more taxonomical information using the `checkm tree` function incorporated in checkM (**you don't have to run it, we have generated the file already for you**). Nonetheless, this is the command you'd use:
|
|
|
What do you think about the quality of these bins? Tomorrow, Chy we will be analyzing some of them using anvi'o (demo). For now, let's analyze them a little further. We can ask checkM to give us a bit more taxonomical information using the `checkm tree` function incorporated in checkM (**you don't have to run it, we have generated the file already for you**). Nonetheless, this is the command you'd use:
|
|
|
|
|
|
```plaintext
|
|
|
$ checkm tree_qa out-checkm-allbins/ -o 2 -f detailed_checkM-allbins --tab_table
|
... | ... | @@ -42,13 +42,17 @@ Bin Id # unique markers (of 43) # multi-copy Insertion branch UID Taxonomy (cont |
|
|
U.maxbin.012 41 0 UID3398 k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodobacterales;f__Rhodobacteraceae g__;s__ 51.6079429195 3.427858 3796 0.883138974835 11 9 58.9159876191 2.34719045897 4.32964955556 0.425642799849 4296.55555556 377.714504505
|
|
|
```
|
|
|
|
|
|
Keep in mind that you could also evaluate the completeness and contamination of MAGs by finding ‘essential’ protein sequences. Take a look at the ‘HMM.essential.rb’ script found in acollection of tools available at <https://github.com/lmrodriguezr/enveomics>. If you are interested in doing other meta(genomic) analyses, you will find many other useful scripts in this repository.
|
|
|
Keep in mind that you could also evaluate the completeness and contamination of MAGs by finding ‘essential’ protein sequences from other collections. Take a look at the ‘HMM.essential.rb’ script found in a collection of tools available at <https://github.com/lmrodriguezr/enveomics>. If you are interested in doing other meta(genomic) analyses, you will find many other useful scripts in this repository.
|
|
|
|
|
|
## GTDB-tk
|
|
|
|
|
|
To run GTDB-tk, we'll need more compute resources than we have on the linux-desktops. Which means we need to learn how to use the high performance computing (HPC) infrastructure at the MPI. Yay!
|
|
|
To run GTDB-tk, we'll need more compute resources than we have on the linux-desktops. Which means that would need to learn how to use the high performance computing (HPC) infrastructure at the MPI. Yay!
|
|
|
|
|
|
Access to the HPC computers is via a scheduling system called Slurm. To use Slurm, we prepare a script that details what resources we want to use, and then the software we want to run. We then submit that script to the scheduler and it deals with finding and allocating the necessary resources (memory, cpus, runtime etc.). Important commands to know are `sinfo` and `squeue`. These tell you what resources are available, and what jobs are currently running, respectively. Try them out now.
|
|
|
**For the purposes of our class, we are not going to be using the HPC infrastructure at the MPI.** **We will examine the results for the MAGs above**.
|
|
|
|
|
|
**_(Skip this part for now, it is a good reference for the future, whenever you research projects needs to be analyzed using the HPC infrastructure. Please scroll down until you see a "Resume here" paragraph)_**
|
|
|
|
|
|
Nonetheless, if you are reading this guide after the class you can access to the HPC computers is via a scheduling system called Slurm. To use Slurm, we prepare a script that details what resources we want to use, and then the software we want to run. We then submit that script to the scheduler and it deals with finding and allocating the necessary resources (memory, cpus, runtime etc.). Important commands to know are `sinfo` and `squeue`. These tell you what resources are available, and what jobs are currently running, respectively. Try them out now.
|
|
|
|
|
|
To do our actual computing, first we need to install GTDB-tk and set the database:
|
|
|
|
... | ... | @@ -103,8 +107,6 @@ Now you can run `squeue` and see if your job is running! Also you'll see a new f |
|
|
|
|
|
It's also possible to load the phylogenetic tree output from GTDB-tk into arb, but it's a bit of a faff. We prepared an example of what it looks like for you to explore; it's in the `marmic_NGS2021/results/day_4/gtdbtk` directory. There's an arb database in there you can view (you first have to go to an arb server `$ ssh arb-X` (put in an number from 1 to 3 in place of the X), then run `arb` from the command line and open the `MarMic-gtdbtk-example.arb` database when the window pops up.
|
|
|
|
|
|
# (
|
|
|
|
|
|
If you _really_ want to view your own tree (you have to be a bit masochistic but whatever, you do you), you'll need to click 'create and import', then choose the file `gtdbtk.bac120.msa.fasta` in your `gtdbtk_denovo` directory, set the Type in the dropdown menu to 'protein', and choose 'fasta_wgap.ift' from the list on the right. When prompted, select 'Generate unique species IDs', then 'None (only 'acc')'. Now you have a database. In the top row of icons you'll see a padlock, and below it a dropdown set of numbers. Change that from '0' to '6'. Then in the top row of menus, click 'Species', then 'Search and Query'. In the new window, click the 'Search' button. Then in the 'More functions' tab, click 'Set Protection of Fields of Listed Species'. This gives you yet another window. Here click on 'name' in the right panel, then '0 temporary' from the list on the left, then click the 'Assign protection to field of listed' button. Nothing will obviously happen but trust me, that's how it's meant to be. Go ahead and click close on this window and also on the 'Search and Query' window. Now click 'File' from the top row, then 'Export', then 'Export fields (to calc-sheet using NDS)', make sure the Column output at the bottom is 'TAB separated', then hit 'Save', then 'close'.
|
|
|
|
|
|
_Now we need to open a new terminal, but don't close ARB!_
|
... | ... | @@ -115,7 +117,7 @@ Click 'File' again, then 'Import', then 'Import fields from calc-sheet'. For the |
|
|
|
|
|
Now it's tree time. Click 'Tree' from the top line of menus, then 'Tree admin'. Then click 'Import', and there should be only the `gtdbtk.bac120.decorated.tree` available. So click that, then the 'Load' button. Finally, in the top row of icons, you'll see some buttons that look a bit like tiny trees. Click the second of those three to view your tree! Was all that pain truly worth it though?
|
|
|
|
|
|
# )
|
|
|
# Resume here:
|
|
|
|
|
|
## ANI and AAI
|
|
|
|
... | ... | |