הטכניון - מכון טכנולוגי לישראל Technion - Israel Institute of Technology Технион - израильский технологический институт ألتخنيون - معهد تكنولوجي لإسرائيل

02360523 - מבוא לביואינפורמטיקה 02360523 - Introduction to Bioinformatics 02360523 - Introduction to Bioinformatics 02360523 - Introduction to Bioinformatics

חורף 2013-2014Winter 2013-2014Зима 2013-2014شتاء 2013-2014

שאלות ותשובות - HW2 Frequently Asked Questions - HW2 Вопросы и Ответы - HW2 أسئلة وأجوبة - HW2

		.. (לתיקייה המכילה)

What should be the format of the input sequences in phylogeny.fr?
(OR- what is a fasta format)

We would like to clarify what is a fasta format. We used it in class and in this and the previous HW but didn't explain it explicitly.
A fasta format is a text file format that is used to represent sequences. Each sequence in the file contains:
1) A title/ name, that start with the character ">"
2) The sequence itself

A fasta format can contain more than one sequence.

For example:
>sequence_A
ATGGTGCGCTC
>sequence_B
CCTGATAG
>sequence_C
GGGGCTCGCTATGAAG

When running a single sequence in BLAST you do not need to keep the title in the fasta format. But if you use a program that requires more than one sequence (such as phylogeny.fr) it's highly important to keep the titles, for the program to differentiate between sequences.
More on fasta format can be found here: http://en.wikipedia.org/wiki/FASTA_format

Q2.2: What does it mean that the algorithm converges?
PSI-BLAST may runs several iterations. When we say it converged (התכנס) it means that no new results were found after a certain number of iterations. The number of iterations needed for convergence is different for different sequences and different parameters.

How to calculate cells in a PAM matrix?

Please follow the steps shown in the lecture (slides 18-19).
The normalized probabilities mentioned in slide 18 are calculated by counting the number of relevant substitution and dividing it by the total number of possible substitutions (which is given in the exercise).
The normalization mentioned in slide 19 should be by the frequency of the amino acid that was changed. For example to normalize the number of substitution (A->B) you should divide it by the frequency of A.

Another note: since log(0) isn't defined, you can change 0 to a small number such as 0.0001 and continue calculating.

שאלות ותשובות - HW2 Frequently Asked Questions - HW2 Вопросы и Ответы - HW2 أسئلة وأجوبة - HW2

What should be the format of the input sequences in phylogeny.fr? (OR- what is a fasta format)

Q2.2: What does it mean that the algorithm *converges*?

How to calculate cells in a PAM matrix?

What should be the format of the input sequences in phylogeny.fr?
(OR- what is a fasta format)

Q2.2: What does it mean that the algorithm converges?