... | ... | @@ -21,13 +21,13 @@ What are fnFs and fnRs? Think about why we might want to make lists of files. (R |
|
|
|
|
|
> sample.names <- sapply(strsplit(basename(fnFs), ".R"), `[`, 1)
|
|
|
|
|
|
This one's a bit funny syntactically. Use the ? to try and figure out what sapply does. The oddest bit is that single square bracket '['. In R, the square bracket is actually a function much like '+' and '-'! It is actually the subset function here.
|
|
|
This one's a bit funny syntactically. Use the ? to try and figure out what sapply does. The oddest bit is that single square bracket '['. In R, the square bracket is actually a function much like '+' and '-'! It's actually the subset function here.
|
|
|
|
|
|
Now we have our file names, we can look at the quality values of the reads:
|
|
|
|
|
|
> plotQualityProfile(fnFs[1:2])
|
|
|
|
|
|
Then modify the above to look at the reverse reads too. How are the scores different between read 1 and read 2?
|
|
|
Modify the above to look at the other samples, and also to look at the reverse reads too. How are the scores different between read 1 and read 2?
|
|
|
|
|
|
Next we want to make file paths for the filtering and trimming step that will be coming up.
|
|
|
|
... | ... | @@ -39,4 +39,13 @@ Next we want to make file paths for the filtering and trimming step that will be |
|
|
|
|
|
Now we move on to actually filtering and quality trimming our reads. Again use ? to see what filterAndTrim does, and what the various options are here.
|
|
|
|
|
|
> filterAndTrim(fnFs, filtFs, fnRs, filtRs, maxN=0, maxEE=c(5,5), compress=TRUE, multithread=8) |
|
|
> out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, maxN=0, maxEE=c(5,5), compress=TRUE, multithread=8)
|
|
|
|
|
|
What is maxEE? Feel free to try out other options like truncQ and truncLen to see if they change the output.
|
|
|
Take a look at what's in the 'out' object and see how it's affected by different trimming parameters.
|
|
|
|
|
|
Now we need to learn the error rates of our data. From the DADA2 documentation:
|
|
|
"The DADA2 algorithm makes use of a parametric error model (err) and every amplicon dataset has a different set of error rates. The `learnErrors` method learns this error model from the data, by alternating estimation of the error rates and inference of sample composition until they converge on a jointly consistent solution. As in many machine-learning problems, the algorithm must begin with an initial guess, for which the maximum possible error rates in this data are used (the error rates if only the most abundant sequence is correct and all the rest are errors)."
|
|
|
|
|
|
|
|
|
|