הטכניון - מכון טכנולוגי לישראל Technion - Israel Institute of Technology Технион - израильский технологический институт ألتخنيون - معهد تكنولوجي لإسرائيل

02360360 - Theory Of Compilation

אביב 2019-2020Spring 2019-2020Весна 2019-2020ربيع 2019-2020

שאלות ותשובות - HW1 Frequently Asked Questions - HW1 Вопросы и Ответы - HW1 أسئلة وأجوبة - HW1

		.. (לתיקייה המכילה)

What should we do if we find an unprintable character inside a closed/unclosed string?

*****Update*****
In order to avoid confusion regarding the multiple possible cases of unprintable characters inside strings, we have decided to allow you to assume that strings (both closed and unclosed) won't contain unprintable characters in the test input files.
Note however that unprintable characters can still appear *outside* of strings in the test files.
Also note - this update should not affect those of you who have already submitted the assignment.

What should we do if the hexadecimal number inside of an escape sequence of the form \u{n} represents an unprintable character?
In this case this simply means that the whole escape sequence is illegal, so you should print the error described in section 4 under "Error Handling" in the HW1 instructions document (page 7).

What should we do if we find an escape sequence inside a comment?

Escape sequences *only* have a meaning if they appear inside a string.
If you find an escape sequence inside a comment you should just ignore it. This applies for both types of comments.
Note - make sure that you understand the difference between an escape sequence and the char that it represents (for example: the LF char vs. the escape sequence '\n' - They are *not* the same thing).
Example:
for the input:
/* this is a comment \n \r \t \u{41} */
The correct output is:
1 COMMENT 1

Note that all the escape sequences (including '\n' and '\r') in the above comment were ignored.

Is it possible for a single line comment (starts with //) to be the last line of an input file? e.g. --- var x = 3 var y = 2 // Comment at the end ----
Update - note the slight change to the definition of the single line character Yes. The correct definition for the single line comment (starting with //) actually is: The lexeme starts with // followed by zero or more printable characters except for a new line character (LF or CR). Thus a single line comment can be found at the last line of an input file.

What should we do if we find an unprintable char inside a comment?

Since unprintable chars don't have glyphs, for this answer only we'll denote a possible unprintable char with: %
There are two cases for this:
1. If an unprintable char is found inside a single-line comment (starts with '//' ), then following the correct definition of the single-line comment (see the above question) this means that this char is not a part of the comment, and thus an error should be printed *after* what should be printed for the comment.
Example:
For the input:
// some comment %

The correct output is:
1 COMMENT 1
Error %

2. If an unprintable char is found inside a multi-line comment (starts with '/*' ), then *only* an error should be printed (since the comment itself is not formed properly):
Example:
For the input:
/*sdfsdf % */

The correct output is:
Error %

Can comments be empty (contain zero characters)?
Yes, both types of comments can contain zero characters. For example, both: // and: /**/ are legal comments.

What should we do if we find an unescaped " (double quotation mark) character in the middle of a string?

Your question here is wrong since you're interpreting the input incorrectly.
Example:
for the input:
"some string"thisisanID"

the correct output is:
1 STRING some string
1 ID thisisanID
Error unclosed string

Note that an unescaped " character in a string simply terminates the string.

A similar explanation applies for the following example:
for the input:
/* some comment */ */

the correct output is:
1 COMMENT 1
1 BINOP *
1 BINOP /

Note that a closing sequence */ in a multi line comment (a comment which starts with /* and ends with */) simply terminates the comment.

What should we do if we find a single line comment nested inside a multi-line comment or vice versa?

For the purpose of this exercise you should only report the "Warning nested comment\n" error if you find a multi-line comment nested inside another multi-line comment.
On the other hand, consider it *legal* to have a single line comment nested inside a multi-line comment or vice versa. In these cases the nested comment is just considered plain text that is part of the outer comment.

Examples of inputs/outputs:

/* outer comment /* illegal nested comment */ */
Warning nested comment

// /* this multi-line comment inside a single-line comment is legal */
1 COMMENT 1

/* // this single-line comment inside a multi-line comment is legal */
1 COMMENT 1

Are we allowed to use the C standard library for our implementation?
Yes, it is allowed and even recommended. A description of the standard library can be found here: https://en.wikipedia.org/wiki/C_standard_library

Can lexemes of the DEC_INT token begin with a sequence of zeros?
For example: is 000033 a valid lexeme for the DEC_INT token?

Yes, and for the above example the correct output should be:
1 DEC_INT 33
Note that in this case the value of the number should be printed *without* the sequence of zeros in the beginning.

Also, just to be clear - for DEC_REAL on the other hand you're required to print the lexeme of the number *exactly* as it appears in the input. For example:
For the input:
0002.2e+10
The correct output is:
1 DEC_REAL 0002.2e+10

Can lexemes of the BIN_INT/OCT_INT/HEX_INT/HEX_FP tokens begin with a sequence of zeros?
Lexemes of the above tokens can include a sequence of zeros after their respective starting sequences ('0b', '0o', '0x', '0x' respectively) but not before them. Examples: For the input: 0b00101 The correct output is: 1 BIN_INT 5 For the input: 000b101 The correct output is: 1 DEC_INT 0 1 ID b101 For the input: 0x000FFp+6 The correct output is: 1 HEX_FP 0x000FFp+6

Can the number n in escape sequences of the form \u{n} start with leading zeros?
Yes, as long as n consists of 1-6 total digits (as described in the hw pdf instructions). Examples: for the input: "\u{0041}" the correct output is: 1 STRING A for the input: "\u{0000041}" the correct output is: Error undefined escape sequence u

What should we print for inputs such as:
"\u

In this case there are two errors: an illegal escape sequence error and an unclosed string error.
Intuitively think of the the (illegal) escape sequence as being 'inside' the string's text, and of the missing " character as being 'outside' the string's text.
With this in mind, we consider the "inner" error to be detected first.
Thus, the expected output for this case is:
Error undefined escape sequence u

What should we print for the following input (two characters total):
"\

This is a special case in which the \ character does not technically form an escape sequence since it is not followed by any char (note that EOF is *not* technically a char).
However, since we do have a case of an unclosed string here, the expected output is therefore:
Error unclosed string

Note - if the above input is followed by a LF or CR char then an illegal escape sequence is formed.
Thus the correct output would be:
Error undefined escape sequence

שאלות ותשובות - HW1 Frequently Asked Questions - HW1 Вопросы и Ответы - HW1 أسئلة وأجوبة - HW1

What should we do if we find an *unprintable* character inside a closed/unclosed string?

What should we do if the hexadecimal number inside of an escape sequence of the form \u{n} represents an unprintable character?

What should we do if we find an escape sequence inside a comment?

Is it possible for a single line comment (starts with //) to be the last line of an input file? e.g. --- var x = 3 var y = 2 // Comment at the end ----

What should we do if we find an *unprintable* char inside a comment?

Can comments be empty (contain zero characters)?

What should we do if we find an unescaped " (double quotation mark) character in the middle of a string?

What should we do if we find a single line comment nested inside a multi-line comment or vice versa?

Are we allowed to use the C standard library for our implementation?

Can lexemes of the DEC_INT token begin with a sequence of zeros? For example: is 000033 a valid lexeme for the DEC_INT token?

Can lexemes of the BIN_INT/OCT_INT/HEX_INT/HEX_FP tokens begin with a sequence of zeros?

Can the number n in escape sequences of the form \u{n} start with leading zeros?

What should we print for inputs such as: "\u

What should we print for the following input (two characters total): "\

What should we do if we find an unprintable character inside a closed/unclosed string?

Is it possible for a single line comment (starts with //) to be the last line of an input file?
e.g.
---
var x = 3
var y = 2
// Comment at the end
----

What should we do if we find an unprintable char inside a comment?

Can lexemes of the DEC_INT token begin with a sequence of zeros?
For example: is 000033 a valid lexeme for the DEC_INT token?

What should we print for inputs such as:
"\u

What should we print for the following input (two characters total):
"\