... | ... | @@ -42,7 +42,6 @@ seqtk subseq -l 80 Sym_Oalg_Gamma1_genome_cds_aa.fasta 5814_X_top25_expressed_ge |
|
|
|
|
|
### 2) Annotate the most transcribed sequences
|
|
|
|
|
|
**Annotate the protein with multiple databases to confirm its function**
|
|
|
|
|
|
To annotate our amino acid sequences, we are using the Pfam database which is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).
|
|
|
Pfam-A.hmm is a library containing Pfam-A HMMs and is searchable with `hmmsearch`.
|
... | ... | @@ -61,3 +60,5 @@ There are two output files `.full.txt` and a tabular summary of it `.tab.txt`. |
|
|
Have look at both but we will be using the `.tab.txt` file to check for our matches.
|
|
|
|
|
|
What are these proteins and in which metabolism are they participating?
|
|
|
|
|
|
Try using [BLASTp](https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins) to double-check the function of our identified proteins in `5814_X_top25_expressed_genes_aa.fasta` |
|
|
\ No newline at end of file |