הטכניון - מכון טכנולוגי לישראל Technion - Israel Institute of Technology Технион - израильский технологический институт ألتخنيون - معهد تكنولوجي لإسرائيل

02360606 - נושאים מתקדמים במדעי המחשב 6 02360606 - Deep Learning 02360606 - Deep Learning 02360606 - Deep Learning

אביב 2017-2018Spring 2017-2018Весна 2017-2018ربيع 2017-2018

שאלות ותשובות - HW 4 Frequently Asked Questions - HW 4 Вопросы и Ответы - HW 4 أسئلة وأجوبة - HW 4

		.. (לתיקייה המכילה)

Why do we use BLEU it is not the right measure for this task? why don't we use perplexity to evaluate our language model?

In general you are right, but if we assume that there is an underlying “good” movie where all the positive reviews are written about it. We want to evaluate whether your generated positive review describes this movie well (and the same for the negative reviews). We expect that the BLEU of positive review to the positives will be higher than the BLEU to the negatives…
We don’t expect to get high BLEU values, we are more interested that you will play with this evaluation method.

The perplexity is a good measure for language models, however, it is highly dependent on the vocabulary size you choose. The BLEU will also better represent this hyper-parameter.