![]() |
.. (לתיקייה המכילה) | |
In the web_crawl_bfs_236700_k3.txt example we see that .txt files are included in the search. Is this legal?
|
Yes. As long as the browser can read a file and parse if as text, it is possible for an html code to appear in it. Thus files like .html/.txt/.xml etc. are considered valid links. Binary files, like .zip/.pdf/.mp3/.mov etc. cannot be parsed and thus cannot contain HTML code. This will usually happen naturally by the HTML parser you're using (See example project) so you don't need to write a lot of code to handle this. |
Can we define more than one interface in our API?
|
Yes. An API can consist of several Interfaces/Classes. |
What should be the order of the BFS?
|
Same as the example provided. The neighbors order is the order of unique links as they appear in the HTML document. The BFS should first traverse according to that order. For example, if a document A contains the following links, and their order in the HTML document is as follows: B C D B A E The BFS will first process B, then C, D and E. |
Should the output be exactly as the supplied .txt examples?
| Yes. You should maintain, the same format and order between nodes. |
Can the API include Abstract classes?
|
Yes. An API is any collection of public method signatures. This does not mean it must not include an implementation. Interfaces, Abstract classes and even regular classes can all be part of an API. |
In case of illegal url string or negative depth and so on ,does it meter what error message we write to the log?
| No. You can write any error message that describes the error. |
Can we assume that the urls that will be passed will be valid ?
|
No. As stated in the assignment description, you should classify errors as Fatal or Non-fatal errors, and produce a message in the appropriate log level. Invalid links are considered Non-fatal, while invalid parameters are an example for Fatal errors. |
What should we print for k=0?
|
[Update] You should print a fatal error message for invalid argument. The root is considered depth 1. |
Sometimes we get an exception when connecting to an address due to connection timeouts.
What should we do?
|
In cases of frequent requests for the same addresses, the Technion server (and others aswell) may block access from curtain clients. In such cases, you should print the error message in error log level, and continue to the next address. You should check again after a few minutes, to ensure this is the case. Otherwise, you may have other problems that require farther debugging. |
When running our program from Eclipse everything works fine.
But when we try running it using the ANT file we get the following error message from Log4j:
<br/>
<br/>
crawl:<br/>
[java] log4j:WARN No appenders could be found for logger (technion.gc.web.Main2).<br/>
[java] log4j:WARN Please initialize the log4j system properly.<br/>
[java] log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.<br/>
crawlDomain:<br/>
[java] log4j:WARN No appenders could be found for logger (technion.gc.web.Main1).<br/>
[java] log4j:WARN Please initialize the log4j system properly.<br/>
[java] log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.<br/>
<br/>
crawl:<br/>
[java] log4j:WARN No appenders could be found for logger (technion.gc.web.Main2).<br/>
[java] log4j:WARN Please initialize the log4j system properly.<br/>
[java] log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.<br/>
crawlDomain:<br/>
[java] log4j:WARN No appenders could be found for logger (technion.gc.web.Main1).<br/>
[java] log4j:WARN Please initialize the log4j system properly.<br/>
[java] log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.<br/>
The problem is that ANT execute the program from the bin folder where all the binary files are stored after compilation.Thus it does not find the log4j.properties file needed to configure Log4j.While Eclipse link the src and resources folders during runtime, ANT and Java command line invocations does not know these folders (unless they are in the classpath).The solution is simple: Simply copy the log4j.properties file to the bin folder yourself.Remember to place under the same path as it was in the resources or src folder. For example, if the log4j.properties was under resources/A/B/C you should copy it to bin/A/B/C. |
Sometimes we get an exception when connecting to an address due to connection timeouts.
What should we do?
|
In cases of frequent requests for the same addresses, the Technion server (and others as well) may block access from curtain clients. In such cases, you should print the error message in error log level, and continue to the next address. You should check again after a few minutes, to ensure this is the case. Otherwise, you may have other problems that require farther debugging. |

