.. (לתיקייה המכילה) | ||
Regarding NUMBER, can a decimal number have leading zeros? e.g. 000042 | |
Yes. |
Regarding RGB, can the numbers be in hexadecimal form? Are they of type NUMBER? | |
No. The numbers inside the RGB brackets must be decimal integers, possibly with a sign. Note that these numbers are part of the lexeme of the RGB token, and in particular, do not generate a NUMBER token. |
Regarding RGB, should the lexeme be printed with the whitespaces inside? | |
Yes, keep the lexemes as they are in the input text, aside from the strings and comments. For instance, if the lexeme was: rgb(10, 20, 30) Then your printed line should be: 1 RGB rgb(10, 20, 30) |
Regarding COMMENT, if we have a nested unclosed comment, which error should be printed? e.g. /* test /* (EOF) | |
The nested comment error (warning). |
Regarding UNPRINTABLE CHARACTERS, If an unprintable character appeared in a string or a comment, what error should we print? | |
Of course, this will not be a valid string or comment. Since an unclosed comment must finish with EOF, and an unclosed string must finish with a new line or EOF, neither of these cases is the case of the question. Therefore, such a string/comment will not be caught as any defined token or bad pattern, and the general error should be printed, as seen in these examples: Examples: Input: /* ... (bad character here) ... */ Output: Error / Input: " ... (bad character here) ... Output: Error " Note: For those of you who used start conditions, this will mean you have to print the "Error /" artificially; that is, it will not be caught by the "dot" pattern, and you will have to handle this inside the start condition by yourselves. |
Regarding UNPRINTABLE CHARACTERS, when to ignore them and when to print an error? #confused | |
First, understand this: Inside a STRING, each character can be written in two ways: - Normally: Directly, explicitly, un-escaped. For example, W. - Escaped: Implicitly, by an escape sequence. For example, \57. Note that even though both W and \57 represent the letter W, each one is written differently. Also note that OUTSIDE of a STRING, ONLY the normal way can be used to write a single character. Now back to the question: (1) If an unprintable character shows in the input directly, unescaped, then this is an error. If this happens "inside" a STRING, then this is not actually a STRING because by definition, a STRING has only printable characters. Thus, the quotation mark (double or single) is by itself invalid, since it is not part of any lexeme. Therefore, the answer of the previous FAQ question regarding unprintable characters holds (i.e. print: Error "). (2) If an unprintable character shows as an escape sequence, then this is the case inside a STRING only, and then what is written in the homework under "printing the lexeme of a string" holds. |
Regarding STRING, what error should be printed if an unclosed string ends with a backslash? | |
There is always something after the backslash. In this case, it is one of these: \r \n \0 (a C string always ends with this, and flex's yytext is a string) Thus, this is an undefined escape sequence. Note: As said in the homework, we expect a new line after printing the errors. Do make sure that in all cases, a new line is printed. One case might not be trivial. |
Regarding RGB, what is this input considered: rgb(0,0,0 | |
Error in rgb parameters. It is enough to have one bracket after a function name to know that it is a function (and not a variable). Therefore, this lexeme is to be considered NOT as a NAME with a bracket following it, but instead as RGB with bad structure. The following shorter input also prints "error in rgb parameters": rgb( |
Regarding STRING, should we check undefined escape sequences in a double-quoted string as well? | |
Yes. The string "\qqq" will issue the error: "Error undefined escape sequence q". |
ESCAPE SEQUENCES IN STRINGS (SUMMARY) | |
Everything here was formerly explained in other FAQ questions or in the PDF. First, We define an escape sequence to be a backslash with at least one PRINTABLE character after it. This means that if we have an unprintable character after a backslash, this is not a valid escape sequence, and this cannot be found inside a STRING, or any other lexeme. More on this in other FAQ questions. Now, there are 4 types of escape sequences. (1) Undefined escape sequences. This is the case when after a backslash, we have a (printable) character that is NOT a hexadecimal digit, and also NOT one of these four: n, r, t, \ Here are 4 examples: \q, \qqq, \H, \" In this case, it is an error (EVEN in double-quoted strings) and we print: Error undefined escape sequence q (or some other character in the end, instead of 'q') Note that we only write the first character that appeared after the backslash, even if there were more. Any other escape sequence is a defined escape sequence. In particular, it is NOT an error, regardless of what it is. Now we will consider the three types of defined escape sequences. In the following three cases, in double-quoted strings, we print the sequence as-is. (2) One of \r, \n, \t, \\ In this case, in single-quoted strings, we print the (one) character that is represented by this sequence. For example, \r becomes the (one) character '\r', and \\ becomes (one) backslash. (3) Backslash and then hexadecimal digits (upto 6 digits), such that the number that they represent is the ASCII value of a PRINTABLE character. In this case, in single-quoted strings, we print the character that they represent, (the character whose ASCII value is this number) instead of the sequence. Example: '\57' --> W (because the ASCII value of W is 57 in hexadecimal) (4) Backslash and then hexadecimal digits (up to 6 digits), such that the number that they represent is **NOT** the ASCII value of a PRINTABLE character. In this case, in single-quoted strings, we do not print the sequence and print nothing instead of it. We continue to print whatever comes after it inside the string. Example: "a\C0a" --> a\C0a 'a\C0a' --> a Hope it is clearer now. |