![]() |
.. (לתיקייה המכילה) | |
My regexes are problematic. | |
Use a regex debugger! They let you see exactly what part of the string matches what. Here is a good one, but there are many others. Just be sure you use one that doesn't see . as "any character" but "any character other than \n". | |
קישור: Link: Ссылка: وصلة: | http://regexr.com/ |
Which token are case sensitive and which are not? | |
You can assume everything is case-insensitive |
Is there a limit on the length of the input? | |
You can assume that each line of the input won't be longer than 1024 characters |
Can we assume integer values won't exceed an int? | |
Yes |
What should we in case of an error? | |
All errors should be dealt by printing an appropriate error message and calling exit(0) |
Can we assume yylineno holds the right value for the line number? | |
Yes |
Can a KEY start or end with a whitespace? | |
No, it can't. KEYs must start with a letter and end with a non-whitespace character. |
In case of an unclosed string containing an undefined escape sequnence, which error should we report? | |
As a general rule, always report the first error you encounter when reading the input. In this case, you will always read the undefined escape sequence before you can determine the string is unclosed. Therefore, you should report the escape sequence. |
If we have a line containing only a comment but there are whitespaces before the comment, should the whitespaces be treated as INDENT or ignored? | |
In case the line contains only whitespaces and a comment, you should ignore the whitespaces. Don't output an INDENT token for that line. |
The homework states that some tokens can only appear after INDENT or ASSIGN. What does that means? Does it have to be directly after these tokens? Do these tokens have to appear in the same line? | |
It means that the relevant tokens can only appear after we've seen INDENT or ASSIGN in the same line, but NOT necessarily directly after INDENT or ASSIGN. It is possible for other tokens to appear between INDENT/ASSIGN and the relevant token. For example, the following lines are valid examples for the token SEP matching the lexeme ",": , key = , 10 , The following will be considered an error (since there is neither INDENT nor ASSIGN): , Same applies to ASSIGN after KEY. |
Can strings contain whitespaces and tabs? | |
Yes |
The homework states that SECTION and LINK contain a KEY. Does that mean they also must appear at the start of a line? | |
No. |
in t2.in it seems like the string a\nb\tc ends at line 3 but t2.out states it ends at line 4. Is that a mistake? | |
No, it's not. This is a side-effect of the way flex works, since it can only detect that the string has ended once it reaches line 4. Don't try to avoid it. This same side-effect also appears in t1.in when parsing the string "value". In general, if an unquoted string ends at the end of the line, it is ok for it to be listed as ending on the following line. We will accept both possible outputs for these cases. |
Can an unquoted string start with a whitespace? | |
No, it can't |
Can an unquoted string contain ';' or '#'? | |
According to the definition of COMMENT, any ';' and '#' outside of a quoted string represent the start of a comment. That means that if a ';' or a '#' appears in an unquoted string, it will end the string and start comment. Therefore the answer is No. |
Is 007 an octal or a decimal number? | |
It is an octal number. If the number starts with 0 and only uses the digits 0-7, it is octal. Otherwise, it's decimal. |
Can an unquoted string start with any legal character? | |
To avoid confusion, you may assume that unquoted strings only start with letter. You can treat any other case as an error. |
The definition of a quoted string states that it can contain whitespaces, tabs and newlines, which will be replaced with whitespaces. | |
Only newlines are replaced with whitespaces. The other 2 options remain as is. |
Can an unquoted string end with whitespaces? | |
No |
The homework states that unquoted strings can't contain escape sequences, but the example input contains \n. Is that an error? | |
This is not an error. A sequence of characters is an escape sequence only if it has a special meaning. The face that unquoted strings don't contain escape sequences simply means that these sequences don't have any special meaning. If \n appeared in a quoted string it would be treated as a newline. When it appears in unquoted strings it is just a pair of characters, '\' and 'n'. |
How should we print \; \: \= \#? | |
\; \: \= \# should be printed as ; : = # respectively |
It is stated that some tokens can only appear after an INDENT or ASSIGN token appeared in the same line. | |
Yes, the above input is legal. For the purposes of determining whether an INDENT/ASSIGN appeared in the line, you should treat the start and end of a quoted string as being part of the same line. A newline in a quoted string is the only exception in which the newline doesn't "reset" the presence of an INDENT/ASSIGN token. |
Should we discard of any trailing whitespaces after strings or comments? | |
Yes |
Given the following input: | |
Since '\' on its own is not allowed in quoted strings, that would be an illegal character error. |
The have been a lot of question regarding the overlap between STRING tokens and other types of tokens. | |
non-STRING tokens should be given precedence. That means that if my input is "key=true true", it should be parsed as 2 TRUE tokens, not a single STRING token. Generally speaking, you should return a STRING token only if the first word can't be parsed differently, i.e. prioritize the more specific tokens. In the example above, the first word was true so we matched it with the TRUE. If the input was "key=true123", we would match it with TRUE and INTEGER. If the input was "key=abctrue 123" we would match everything as a STRING since "abctrue" doesn't match any more specific token, or more precisely because "abc" doesn't match anything else. We always look at the beginning of the string and ask whether that can be more specific than STRING. Note that it is possible to implement this functionality using regexes. It does not require processing the input in c code. |
Are . and -. valid REAL numbers? | |
REALs need to include at least one digit. input such as . and -. would not be valid REALs. |
Should we print + before integers? | |
Print 7. More generally, the output for integers should be the same as the output of printf("%d", n); where n is an int variable containing the value of your integer. |