Updated Practical 3 Processing 16S rRNA amplicon data (markdown) authored by Ben Francis's avatar Ben Francis
......@@ -18,9 +18,27 @@ The first parts of the process are now just preparing the files and file names.
> path <- "."
> fnFs <- sort(list.files(path, pattern="R1.fastq.gz", full.names = TRUE))
> fnRs <- sort(list.files(path, pattern="R2.fastq.gz", full.names = TRUE))
What are fnFs and fnRs? Think about why we might want to make lists of files. (Remember how we used loops earlier to cycle through lists of files. Maybe in R we won't need to use a loop if we have a vector of items)
> sample.names <- sapply(strsplit(basename(fnFs), ".R"), `[`, 1)
This one's a bit funny syntactically. Use the ? to try and figure out what sapply does. The oddest bit is that single square bracket '['. In R, the square bracket is actually a function much like '+' and '-'! It is actually the subset function here.
Now we have our file names, we can look at the quality values of the reads:
> plotQualityProfile(fnFs[1:2])
Then modify the above to look at the reverse reads too. How are the scores different between read 1 and read 2?
Next we want to make file paths for the filtering and trimming step that will be coming up.
> filtFs <- file.path(path, "filtered", paste0(sample.names, "_F_filt.fastq.gz"))
> filtRs <- file.path(path, "filtered", paste0(sample.names, "_R_filt.fastq.gz"))
> names(filtFs) <- sample.names
> names(filtRs) <- sample.names
Now we move on to actually filtering and quality trimming our reads. Again use ? to see what filterAndTrim does, and what the various options are here.
> filterAndTrim(fnFs, filtFs, fnRs, filtRs, maxN=0, maxEE=c(5,5), compress=TRUE, multithread=8)