... | ... | @@ -8,6 +8,8 @@ |
|
|
|
|
|
What is the output of `kallisto`?
|
|
|
|
|
|
<br>
|
|
|
|
|
|
Sort the `abundance.tsv` table by the 5th column (TPM) and extract the ID of the 25 most transcribed genes.
|
|
|
|
|
|
First of all we are sorting `abundance.tsv` by TPM and remove all genes with a TPM of 0.
|
... | ... | @@ -18,6 +20,7 @@ tail -n +2 abundance.tsv | awk '$5 != 0 {print $0}' | sort -g -r -k5 > abundance |
|
|
Can you tell how the command works?
|
|
|
Try using an AI language model like ChatGPT or Google Gemini to explain the three parts.
|
|
|
|
|
|
<br>
|
|
|
|
|
|
Extract the gene names (first column) of the top 25 most transcribed genes from our sorted file.
|
|
|
|
... | ... | @@ -26,6 +29,8 @@ Extract the gene names (first column) of the top 25 most transcribed genes from |
|
|
```
|
|
|
The first part of this one-liner reads only the first 25 lines of our sorted abundance file. The second part takes these 25 lines and cuts them to only include the first (`-f1`) column.
|
|
|
|
|
|
<br>
|
|
|
|
|
|
Next up, we use the names of our 25 most transcribed genes to extract the corresponding amino acid sequences from an already predicted proteome of Gamma1, which you can find here:
|
|
|
|
|
|
```
|
... | ... | |