... | ... | @@ -18,9 +18,27 @@ The first parts of the process are now just preparing the files and file names. |
|
|
> path <- "."
|
|
|
> fnFs <- sort(list.files(path, pattern="R1.fastq.gz", full.names = TRUE))
|
|
|
> fnRs <- sort(list.files(path, pattern="R2.fastq.gz", full.names = TRUE))
|
|
|
|
|
|
What are fnFs and fnRs? Think about why we might want to make lists of files. (Remember how we used loops earlier to cycle through lists of files. Maybe in R we won't need to use a loop if we have a vector of items)
|
|
|
|
|
|
> sample.names <- sapply(strsplit(basename(fnFs), ".R"), `[`, 1)
|
|
|
|
|
|
This one's a bit funny syntactically. Use the ? to try and figure out what sapply does. The oddest bit is that single square bracket '['. In R, the square bracket is actually a function much like '+' and '-'! It is actually the subset function here.
|
|
|
|
|
|
Now we have our file names, we can look at the quality values of the reads:
|
|
|
|
|
|
> plotQualityProfile(fnFs[1:2])
|
|
|
|
|
|
Then modify the above to look at the reverse reads too. How are the scores different between read 1 and read 2?
|
|
|
|
|
|
Next we want to make file paths for the filtering and trimming step that will be coming up.
|
|
|
|
|
|
> filtFs <- file.path(path, "filtered", paste0(sample.names, "_F_filt.fastq.gz"))
|
|
|
> filtRs <- file.path(path, "filtered", paste0(sample.names, "_R_filt.fastq.gz"))
|
|
|
> names(filtFs) <- sample.names
|
|
|
> names(filtRs) <- sample.names
|
|
|
|
|
|
|
|
|
Now we move on to actually filtering and quality trimming our reads. Again use ? to see what filterAndTrim does, and what the various options are here.
|
|
|
|
|
|
> filterAndTrim(fnFs, filtFs, fnRs, filtRs, maxN=0, maxEE=c(5,5), compress=TRUE, multithread=8) |