.. (לתיקייה המכילה) | ||
Can we assume the entire dump can be loaded into main memory? | |
No, you cannot. |
How should we compare values of type "time" or of types different than integer or string? | |
For simplicity, you should consider only values that are references to other items and strings. You should ignore all the rest. Note: 1. In contrast to what it says in the pdf, you should not consider values of type integer. 2. Strings are anything that is surrounded with double quotes ("). For instance, the "datatype" of values of type string can be "string", "url" or "commonsMedia". |
Why in the provided test only one similar property-value pair was selected, even though there are more similar pairs? | |
This was a bug in our tests. We have uploaded a fixed version which solves this issue. |
In the output fields "propertyId" and "similarValue", should we add the prefixes Q or P (depending on the type of the entity) if the value is a reference to another entity? | |
Yes. Note that "propertyId" always refers to some property so the "P" prefix should always be added. That is not the case in "similarValue" where the common value is not necessarily a reference to another item. |
When checking similarity between two items, if for a single property there are multiple similar values that are common in these two items, then each value forms a different property-value pair? | |
Yes. |
Can we use the SimilarItem and Similarity classes that were provided with the skeleton project? | |
Yes, but do not import them from the test folder. Instead, copy them to the src folder. (We remind that your code should reside in the src folder) |
We use in our implementation a different JSON library than the one that is used in the provided tests. Can we change the test files accordingly? | |
No, the test files must not be changed, and your code should pass them even though you use a different library. |
Can a base item be similar to itself? | |
No, we ignore this (uninteresting) case. |
How should we handle properties that their snaktype is novalue/somevalue? | |
You should ignore them. |
What should we put in the label field in case there is no label in English? | |
null. |
In the dry part, what exactly do you mean by returning the top-K similar items? | |
In the dry part we extend the functionality of the program you are required to implement in the wet part, in the way that your program should return only the top-K similar items, rather than all similar items with DOS greater than a given threshold. Think of it as if we have added a fourth parameter to the find() method that indicates that K. You should provide an efficient algorithm (but not necessarily the most efficient) which minimizes disk accesses, and discuss how this extension can be incorporated into your current implementation of the wet part. Clarification: You should output the top K items (i.e., there should be K JSON objects (at most) in the output array) where these items have the largest DOS among all the returned objects. You may consider in your answer how you would deal with an output that cannot be stored in main memory. |