.. (לתיקייה המכילה) | ||
What should be the format of the input sequences in phylogeny.fr? | |
We would like to clarify what is a fasta format. We used it in class and in this and the previous HW but didn't explain it explicitly. A fasta format is a text file format that is used to represent sequences. Each sequence in the file contains: 1) A title/ name, that start with the character ">" 2) The sequence itself A fasta format can contain more than one sequence. For example: >sequence_A ATGGTGCGCTC >sequence_B CCTGATAG >sequence_C GGGGCTCGCTATGAAG When running a single sequence in BLAST you do not need to keep the title in the fasta format. But if you use a program that requires more than one sequence (such as phylogeny.fr) it's highly important to keep the titles, for the program to differentiate between sequences. More on fasta format can be found here: http://en.wikipedia.org/wiki/FASTA_format |
Q2.2: What does it mean that the algorithm *converges*? | |
PSI-BLAST may runs several iterations. When we say it converged (התכנס) it means that no new results were found after a certain number of iterations. The number of iterations needed for convergence is different for different sequences and different parameters. |
How to calculate cells in a PAM matrix? | |
Please follow the steps shown in the lecture (slides 18-19). The normalized probabilities mentioned in slide 18 are calculated by counting the number of relevant substitution and dividing it by the total number of possible substitutions (which is given in the exercise). The normalization mentioned in slide 19 should be by the frequency of the amino acid that was changed. For example to normalize the number of substitution (A->B) you should divide it by the frequency of A. Another note: since log(0) isn't defined, you can change 0 to a small number such as 0.0001 and continue calculating. |