... | @@ -3,7 +3,7 @@ |
... | @@ -3,7 +3,7 @@ |
|
Before we begin, there's one thing we need to download, which is the formatted silva database used by DADA2 to assign taxonomy.
|
|
Before we begin, there's one thing we need to download, which is the formatted silva database used by DADA2 to assign taxonomy.
|
|
|
|
|
|
$ cd /bioinf/home/your_username/marmic_NGS2020/day_1/demultiplexed
|
|
$ cd /bioinf/home/your_username/marmic_NGS2020/day_1/demultiplexed
|
|
$ wget https://zenodo.org/record/1172783/files/silva_nr_v132_train_set.fa.gz
|
|
$ wget https://zenodo.org/record/3986799/files/silva_nr99_v138_train_set.fa.gz
|
|
|
|
|
|
### 3.1 Processing our data with DADA2
|
|
### 3.1 Processing our data with DADA2
|
|
Now to get going with DADA2. DADA2 is a very large software with an entire integrated pipeline that takes raw reads (minus primers and barcodes) as input. It also needs lots of dependencies to do all of its things. So in order to avoid wasting several hours installing it, I have a version of R with everything installed that you'll be using for this practical.
|
|
Now to get going with DADA2. DADA2 is a very large software with an entire integrated pipeline that takes raw reads (minus primers and barcodes) as input. It also needs lots of dependencies to do all of its things. So in order to avoid wasting several hours installing it, I have a version of R with everything installed that you'll be using for this practical.
|
... | @@ -12,7 +12,7 @@ Now to get going with DADA2. DADA2 is a very large software with an entire integ |
... | @@ -12,7 +12,7 @@ Now to get going with DADA2. DADA2 is a very large software with an entire integ |
|
|
|
|
|
and if you're not already there, go to the data:
|
|
and if you're not already there, go to the data:
|
|
|
|
|
|
> setwd("/bioinf/home/your_username/marmic_NGS2020/day_1/demultiplexed")
|
|
> setwd("/bioinf/home/your_username/marmic_NGS2021/day_1/demultiplexed")
|
|
|
|
|
|
Then start dada2:
|
|
Then start dada2:
|
|
|
|
|
... | @@ -28,7 +28,7 @@ What are fnFs and fnRs? Think about why we might want to make lists of files. (R |
... | @@ -28,7 +28,7 @@ What are fnFs and fnRs? Think about why we might want to make lists of files. (R |
|
|
|
|
|
> sample.names <- sapply(strsplit(basename(fnFs), ".R"), `[`, 1)
|
|
> sample.names <- sapply(strsplit(basename(fnFs), ".R"), `[`, 1)
|
|
|
|
|
|
This one's a bit funny syntactically. Use the `?` to try and figure out what sapply does. The oddest bit is that single square bracket `[`. In R, the square bracket is actually a function much like `+` and `-`! It's actually the subset function here.
|
|
This one's a bit funny syntactically. Use the `?` to try and figure out what sapply does. The oddest bit is that single square bracket `[`. In R, the square bracket is actually a function much like `+` and `-`! Here it's the subset function.
|
|
|
|
|
|
Now we have our file names, we can look at the quality values of the reads:
|
|
Now we have our file names, we can look at the quality values of the reads:
|
|
|
|
|
... | @@ -95,7 +95,7 @@ Don't worry about what this code is actually doing (clearly the developer knew w |
... | @@ -95,7 +95,7 @@ Don't worry about what this code is actually doing (clearly the developer knew w |
|
How do the numbers look? It's ok to lose a few more to chimeras than the other steps, so long as there's not anywhere else that we lose more than 50% of our reads.
|
|
How do the numbers look? It's ok to lose a few more to chimeras than the other steps, so long as there's not anywhere else that we lose more than 50% of our reads.
|
|
The final step is then to classify the sequences, using the silva database we downloaded before. This also takes a while, so again, we might start it and then let it run while we do something else.
|
|
The final step is then to classify the sequences, using the silva database we downloaded before. This also takes a while, so again, we might start it and then let it run while we do something else.
|
|
|
|
|
|
> taxa <- assignTaxonomy(seqtab.nochim, "silva_nr_v132_train_set.fa.gz", multithread=8)
|
|
> taxa <- assignTaxonomy(seqtab.nochim, "silva_nr99_v138_train_set.fa.gz", multithread=8)
|
|
|
|
|
|
Now lets save our results, so that we can use them again later without having to do all the calculations again:
|
|
Now lets save our results, so that we can use them again later without having to do all the calculations again:
|
|
|
|
|
... | @@ -165,7 +165,7 @@ then do the plotting :) |
... | @@ -165,7 +165,7 @@ then do the plotting :) |
|
So the bit above creates the function called `plotTaxon()`, which we can then use to generate the plots. We're going to create a pdf of the plot by first opening a pdf file with the `pdf()` command, then run the `plotTaxon()` function, then switch off the plotting device.
|
|
So the bit above creates the function called `plotTaxon()`, which we can then use to generate the plots. We're going to create a pdf of the plot by first opening a pdf file with the `pdf()` command, then run the `plotTaxon()` function, then switch off the plotting device.
|
|
|
|
|
|
> pdf("Marmic-amplicon-plot.pdf")
|
|
> pdf("Marmic-amplicon-plot.pdf")
|
|
> plotTaxon(seqtab.nochim, taxa, "Bacteroidetes", 0.01, "Genus")
|
|
> plotTaxon(seqtab.nochim, taxa, "Bacteroidota", 0.01, "Genus")
|
|
> dev.off()
|
|
> dev.off()
|
|
|
|
|
|
Once you have your plot, you can open a new terminal (don't quit R because we will want to create different plots for different taxa - just change the file name for the pdf) and run:
|
|
Once you have your plot, you can open a new terminal (don't quit R because we will want to create different plots for different taxa - just change the file name for the pdf) and run:
|
... | | ... | |