... | ... | @@ -4,4 +4,104 @@ |
|
|
|
|
|
* Comparing expression between treatment and control libraries
|
|
|
|
|
|
* Creating a heatmap to identify differences in transcription between conditions
|
|
|
|
|
|
* Bonus: Identify pathway and upstream/downstream proteins
|
|
|
|
|
|
|
|
|
|
|
|
#### 1) Inferring and visualizing protein trees
|
|
|
|
|
|
But first, activate our `conda` environment:
|
|
|
|
|
|
```
|
|
|
$ conda activate marmic2024
|
|
|
```
|
|
|
|
|
|
Choose which protein sequences you want to show in the tree. Include at least the query reference sequences and the corresponding sequences you found Gamma1. To get more information also include closely and not so closely related BLASTp (online) matches. You can download these matches as `.fasta` files using the browser tool.
|
|
|
|
|
|
Concatenate all sequence files you want in your tree:
|
|
|
|
|
|
```bash
|
|
|
$ cat your_interesting_proteins.fasta your_interesting_genes_aa.fasta blastp_reference_proteins.fasta > your_interesting_proteins_n_references.fasta
|
|
|
```
|
|
|
*__Note:__ The first file contains the query sequences we provided, the second one consists of your corresponding Gamma1 sequences and the last one contains your `BLASTp` reference sequences.*
|
|
|
|
|
|
To infer the tree we need to align all sequences. We will be using `MAFFT`, a standard program to create multiple sequence alignments (MSA) from `.fasta` files:
|
|
|
|
|
|
```bash
|
|
|
$ mafft --maxiterate 1000 --localpair your_interesting_proteins_n_references.fasta > your_interesting_proteins_n_references_aligned.fasta
|
|
|
```
|
|
|
|
|
|
*__Note:__ The options `--maxiterate` and `--localpair` run a local alignment iteratively and pairwise. See this [website](https://open.oregonstate.education/appliedbioinformatics/chapter/chapter-3/) if you are interested in an in-depth explanation of the differences between local (Smith-Waterman) and global (Needleman-Wunsch) sequence alignment algorithms.*
|
|
|
|
|
|
Now run `IQTree` to infer the phylogeny:
|
|
|
|
|
|
```
|
|
|
$ iqtree2 -s your_interesting_proteins_n_references_aligned_trimmed.fasta --alrt 1000 -B 1000
|
|
|
```
|
|
|
*__Note:__ `-alrt` specifies the number of bootstrap replicates for an SH-like approximate likelihood ratio test *
|
|
|
|
|
|
|
|
|
You can visualize your `IQtree` output (`.treefile`) using [iTOL tree viewer](https://itol.embl.de/upload.cgi).
|
|
|
|
|
|
<br>
|
|
|
|
|
|
#### 2) Visualizing transcriptional patterns using heatmaps
|
|
|
|
|
|
On day 6 you prepared the following tab separated file with all TPMs of the genes that your group works on:
|
|
|
|
|
|
*your_interesting_genes_TPM_all_libraries.tsv*
|
|
|
| gene | 5814_A| 5814_B | **...** |
|
|
|
| ------ | ------ |------ |------ |
|
|
|
| Gene 1 | | | |
|
|
|
| Gene 2 | | | |
|
|
|
| Gene 3 | | | |
|
|
|
| **...** | | | |
|
|
|
|
|
|
<br>
|
|
|
|
|
|
|
|
|
|
|
|
Copy the `R` script below to your personal directory and use it to create a heat map of your TPM values. You can also find it as `TPM_heatmap.R` in our `./software` course directory. You can either open and run the script in `RStudio` (deactivate the `conda` environment first to avoid errors), or run it in the command line:
|
|
|
|
|
|
```
|
|
|
cp /path/to/symbiosis_bioinfo-2024/software/TPM_heatmap.R /path/to/personal/directory/
|
|
|
rstudio .../personal/directory/TPM_heatmap.R
|
|
|
Rscript .../personal/directory/TPM_heatmap.R
|
|
|
```
|
|
|
*__Note:__ Remember to change the input file name in line 4 accordingly, for example with `nano`.*
|
|
|
|
|
|
|
|
|
```r
|
|
|
#install.packages("pheatmap")
|
|
|
#install.packages("RColorBrewer")
|
|
|
|
|
|
library(pheatmap)
|
|
|
library(RColorBrewer)
|
|
|
|
|
|
tpm <- read.table(file = '/path/to/your_interesting_genes_TPM_all_libraries.tsv', sep = '\t', header = TRUE, row.names = "gene")
|
|
|
tpm_num = as.matrix(tpm[,1:8])
|
|
|
|
|
|
colnames(tpm_num) <- c("5814_A",
|
|
|
"5814_B",
|
|
|
"5814_C",
|
|
|
"5814_E",
|
|
|
"5814_T",
|
|
|
"5814_R",
|
|
|
"5814_S",
|
|
|
"5814_Q")
|
|
|
|
|
|
cal_z_score <- function(x){
|
|
|
(x - mean(x)) / sd(x)
|
|
|
}
|
|
|
|
|
|
tpm_num_norm <- t(apply(tpm_num, 1, cal_z_score))
|
|
|
pheatmap(tpm_num_norm, cluster_rows=TRUE, cluster_cols=TRUE)
|
|
|
```
|
|
|
|
|
|
<br>
|
|
|
|
|
|
This is how a heatmap and a phylogenetic tree of our genes could look like:
|
|
|
|
|
|
 |