... | @@ -33,7 +33,7 @@ Now we have our file names, we can look at the quality values of the reads. |
... | @@ -33,7 +33,7 @@ Now we have our file names, we can look at the quality values of the reads. |
|
|
|
|
|
Modify the stuff inside the `plotQualityProfile()` function above to look at the other samples, and also to look at the reverse reads. How are the scores different between read 1 and read 2?
|
|
Modify the stuff inside the `plotQualityProfile()` function above to look at the other samples, and also to look at the reverse reads. How are the scores different between read 1 and read 2?
|
|
|
|
|
|
Next we want to make file paths for the filtering and trimming step that will be coming up.
|
|
Next, we want to make file paths for the filtering and trimming step that will be coming up.
|
|
|
|
|
|
> filtFs <- file.path(path, "filtered", paste0(sample.names, "_F_filt.fastq.gz"))
|
|
> filtFs <- file.path(path, "filtered", paste0(sample.names, "_F_filt.fastq.gz"))
|
|
> filtRs <- file.path(path, "filtered", paste0(sample.names, "_R_filt.fastq.gz"))
|
|
> filtRs <- file.path(path, "filtered", paste0(sample.names, "_R_filt.fastq.gz"))
|
... | @@ -45,7 +45,8 @@ Now we move on to actually filtering and quality trimming our reads. Again use ` |
... | @@ -45,7 +45,8 @@ Now we move on to actually filtering and quality trimming our reads. Again use ` |
|
> out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, maxN=0, maxEE=c(5,5), compress=TRUE, multithread=8)
|
|
> out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, maxN=0, maxEE=c(5,5), compress=TRUE, multithread=8)
|
|
|
|
|
|
What is maxEE? Feel free to try out other options like truncQ and truncLen to see if they change the output.
|
|
What is maxEE? Feel free to try out other options like truncQ and truncLen to see if they change the output.
|
|
Take a look at what's in the 'out' object and see how it's affected by different trimming parameters.
|
|
|
|
|
|
Have a look at the out object, it will tell you how many reads passed the quality trimming and filtering. If you think you lost a lot of reads, then maybe change the parameters.
|
|
|
|
|
|
Now we need to learn the error rates of our data. From the DADA2 documentation:
|
|
Now we need to learn the error rates of our data. From the DADA2 documentation:
|
|
"The DADA2 algorithm makes use of a parametric error model (err) and every amplicon dataset has a different set of error rates. The `learnErrors` method learns this error model from the data, by alternating estimation of the error rates and inference of sample composition until they converge on a jointly consistent solution. As in many machine-learning problems, the algorithm must begin with an initial guess, for which the maximum possible error rates in this data are used (the error rates if only the most abundant sequence is correct and all the rest are errors)." This step takes a while without producing on-screen output, so don't worry if nothing happens for a good few minutes.
|
|
"The DADA2 algorithm makes use of a parametric error model (err) and every amplicon dataset has a different set of error rates. The `learnErrors` method learns this error model from the data, by alternating estimation of the error rates and inference of sample composition until they converge on a jointly consistent solution. As in many machine-learning problems, the algorithm must begin with an initial guess, for which the maximum possible error rates in this data are used (the error rates if only the most abundant sequence is correct and all the rest are errors)." This step takes a while without producing on-screen output, so don't worry if nothing happens for a good few minutes.
|
... | | ... | |