![]() |
.. (לתיקייה המכילה) | |
Why do we use BLEU it is not the right measure for this task? why don't we use perplexity to evaluate our language model? | |
In general you are right, but if we assume that there is an underlying “good” movie where all the positive reviews are written about it. We want to evaluate whether your generated positive review describes this movie well (and the same for the negative reviews). We expect that the BLEU of positive review to the positives will be higher than the BLEU to the negatives… We don’t expect to get high BLEU values, we are more interested that you will play with this evaluation method. The perplexity is a good measure for language models, however, it is highly dependent on the vocabulary size you choose. The BLEU will also better represent this hyper-parameter. |