Now lets save our results, so that we can use them again later without having to do all the calculations again:
Now lets save our results, so that we can use them again later without having to do all the calculations again. Before doing this, let's create a new directory that will contain all of our finished outputs.
> saveRDS(seqtab.nochim,"seqtab_final.rds")
> saveRDS(taxa, "taxa_final.rds")
> dir.create("dada_output") # This will be created in our working directory
Now that we have these outputs, we need to make them useful! One way to do that is to use the R package phyloseq, which will integrate all kinds of fun things. You can check out the phyloseq/DADA2 interaction on the DADA2 website: https://benjjneb.github.io/dada2/tutorial.html
Now go ahead and save the files
For now though, we're just going to use a (relatively) simple plotting function to inspect the sequence diversity in our samples. If you get bored you can try and puzzle through what the plot function is doing and start changing things. But if you're not bothered, you still get some nice plots :)
> saveRDS(seqtab.nochim,"seqtab_final.rds")
> saveRDS(taxa, "taxa_final.rds")
# Start from here on Tuesday
We're all going to start today by making some plots. Sorry if the full dada processing took too long and you didn't get to run it all yourself. If you didn't, all you missed was running some R code and inspecting big unwieldy tables.
Start by getting the full processed dada data:
The files saved above are the standard DADA2 files but we can make them more readable and in a better format to use with other tools also.
The following lines of code are simply editing the data files and saving them so that you have the final data in the best possible format for further analysis.
We should now have three output files that contain all of the information in a nice and clear format:
ASV_representatives.fasta = fasta file with our ASV sequences
ASV_representatives_count_table.tsv = table containing ASVs and their counts across samples
ASV_representatives_taxa_table.tsv = table containing ASVs and their taxonomy
**Visualise the data**
Great job if you have managed to get everything working and the output files produced. However, if you couldn't manage too, then don't worry! We have already got the output files for you.
# you don't need this if you created these objects yourself already
Now we are going to use the produced files in order to make some plots and visualise the data. To do this, we are going to use the phyloseq package. Phyloseq is a very powerful tool for ecological analysis. It allows you to combine together lots of information into a single object, e.g. count data, taxa data, sample metadata, phylogenetic trees etc.
# then start R
$ R
First, let's load the necessary libraries:
load the data:
> seqtab.nochim <- readRDS("seqtab_final.rds")
> taxa <- readRDS("taxa_final.rds")
then do the plotting :)
```
library(phyloseq)
library(ggplot2)
library(reshape2)
library(RColorBrewer)
library(vegan)
library(plyr)
Before we create our phyloseq object, we can import some metadata about our samples, e.g. Date, location etc.
(Hint - try Desulfobacterota (with reduced minimum abundance) or Campilobacterota, or try changing the taxonomic level you view, to see what inferences you can make about your samples)
If you want to look at subsets of the data, which would be a good idea, try selecting just a handful of rows of interest from the `seqtab.nochim` object and feeding that into the plotting function. R has a `grep()` function, that you'll need to look up. The sample names you'll want to search in are in `row.names(seqtab.nochim)`. Subsetting an R object is done by using square brackets `[]` after the name of the object you want to subset.