... | @@ -52,7 +52,7 @@ What is maxEE? Feel free to try out other options like truncQ and truncLen to se |
... | @@ -52,7 +52,7 @@ What is maxEE? Feel free to try out other options like truncQ and truncLen to se |
|
Take a look at what's in the 'out' object and see how it's affected by different trimming parameters.
|
|
Take a look at what's in the 'out' object and see how it's affected by different trimming parameters.
|
|
|
|
|
|
Now we need to learn the error rates of our data. From the DADA2 documentation:
|
|
Now we need to learn the error rates of our data. From the DADA2 documentation:
|
|
"The DADA2 algorithm makes use of a parametric error model (err) and every amplicon dataset has a different set of error rates. The `learnErrors` method learns this error model from the data, by alternating estimation of the error rates and inference of sample composition until they converge on a jointly consistent solution. As in many machine-learning problems, the algorithm must begin with an initial guess, for which the maximum possible error rates in this data are used (the error rates if only the most abundant sequence is correct and all the rest are errors)."
|
|
"The DADA2 algorithm makes use of a parametric error model (err) and every amplicon dataset has a different set of error rates. The `learnErrors` method learns this error model from the data, by alternating estimation of the error rates and inference of sample composition until they converge on a jointly consistent solution. As in many machine-learning problems, the algorithm must begin with an initial guess, for which the maximum possible error rates in this data are used (the error rates if only the most abundant sequence is correct and all the rest are errors)." This step takes a while without producing on-screen output, so don't worry if nothing happens for a good few minutes.
|
|
|
|
|
|
> errF <- learnErrors(filtFs, multithread=8)
|
|
> errF <- learnErrors(filtFs, multithread=8)
|
|
> errR <- learnErrors(filtRs, multithread=8)
|
|
> errR <- learnErrors(filtRs, multithread=8)
|
... | @@ -79,7 +79,7 @@ The next step is to make the ASV table (formatted like a standard OTU table with |
... | @@ -79,7 +79,7 @@ The next step is to make the ASV table (formatted like a standard OTU table with |
|
|
|
|
|
> seqtab <- makeSequenceTable(mergers)
|
|
> seqtab <- makeSequenceTable(mergers)
|
|
|
|
|
|
Then we need to remove chimeric sequences. DADA2 does this by aligning sequences to each other and checking if any have two "parents" - i.e. they match perfectly over some of their length to two different unique sequences. This step takes an age, so if it's late in the day just leave it running and we can return to it tomorrow.
|
|
Then we need to remove chimeric sequences. DADA2 does this by aligning sequences to each other and checking if any have two "parents" - i.e. they match perfectly over some of their length to two different unique sequences. This step also takes a bit of time, so just let it do its thing.
|
|
|
|
|
|
> seqtab.nochim <- removeBimeraDenovo(seqtab, method="consensus", multithread=8, verbose=TRUE)
|
|
> seqtab.nochim <- removeBimeraDenovo(seqtab, method="consensus", multithread=8, verbose=TRUE)
|
|
|
|
|
... | @@ -93,7 +93,7 @@ Don't worry about what this code is actually doing (clearly the developer knew w |
... | @@ -93,7 +93,7 @@ Don't worry about what this code is actually doing (clearly the developer knew w |
|
> head(track)
|
|
> head(track)
|
|
|
|
|
|
How do the numbers look? It's ok to lose a few more to chimeras than the other steps, so long as there's not anywhere else that we lose more than 50% of our reads.
|
|
How do the numbers look? It's ok to lose a few more to chimeras than the other steps, so long as there's not anywhere else that we lose more than 50% of our reads.
|
|
The final step is then to classify the sequences, using the silva database we downloaded before. This also takes a while, so again, we might start it and then let it run while we do something else.
|
|
The final step is then to classify the sequences, using the silva database we downloaded before. This also takes a while (half an hour maybe), so again, we might start it and then let it run while we do something else.
|
|
|
|
|
|
> taxa <- assignTaxonomy(seqtab.nochim, "silva_nr99_v138_train_set.fa.gz", multithread=8)
|
|
> taxa <- assignTaxonomy(seqtab.nochim, "silva_nr99_v138_train_set.fa.gz", multithread=8)
|
|
|
|
|
... | @@ -107,12 +107,12 @@ Now that we have these outputs, we need to make them useful! One way to do that |
... | @@ -107,12 +107,12 @@ Now that we have these outputs, we need to make them useful! One way to do that |
|
For now though, we're just going to use a (relatively) simple plotting function to inspect the sequence diversity in our samples. If you get bored you can try and puzzle through what the plot function is doing and start changing things. But if you're not bothered, you still get some nice plots :)
|
|
For now though, we're just going to use a (relatively) simple plotting function to inspect the sequence diversity in our samples. If you get bored you can try and puzzle through what the plot function is doing and start changing things. But if you're not bothered, you still get some nice plots :)
|
|
Like I said at the beginning, installation of DADA2 wasn't easy, and so in this R installation we don't get plots appearing on screen, we have to write them to file and look at them separately. I know.
|
|
Like I said at the beginning, installation of DADA2 wasn't easy, and so in this R installation we don't get plots appearing on screen, we have to write them to file and look at them separately. I know.
|
|
|
|
|
|
# Change of plans
|
|
# If you didn't make it to the plotting stage on Monday, start from here on Tuesday
|
|
We're all going to start today by making some plots. Sorry that the full dada processing took longer than I anticipated.
|
|
We're all going to start today by making some plots. Sorry if the full dada processing took too long and you didn't get to run it all yourself. If you didn't, all you missed was running some R code and inspecting big unwieldy tables.
|
|
Start by getting the full processed dada data:
|
|
Start by getting the full processed dada data:
|
|
|
|
|
|
$ cp /bioinf/home/tfrancis/marmic_NGS2020/16S_amplicons/demultiplexed/seqtab_final.rds .
|
|
$ cp /bioinf/home/tfrancis/marmic2021/demultiplexed/seqtab_final.rds .
|
|
$ cp /bioinf/home/tfrancis/marmic_NGS2020/16S_amplicons/demultiplexed/taxa_final.rds .
|
|
$ cp /bioinf/home/tfrancis/marmic2021/demultiplexed/taxa_final.rds .
|
|
|
|
|
|
Then R
|
|
Then R
|
|
|
|
|
... | @@ -174,4 +174,4 @@ Once you have your plot, you can open a new terminal (don't quit R because we wi |
... | @@ -174,4 +174,4 @@ Once you have your plot, you can open a new terminal (don't quit R because we wi |
|
|
|
|
|
to view it. Happy plotting!
|
|
to view it. Happy plotting!
|
|
|
|
|
|
(Hint - try Deltaproteobacteria and Epsilonbacteraeota and see what inferences you can make about your samples) |
|
(Hint - try Deltaproteobacteria and Campilobacterota and see what inferences you can make about your samples) |