|
Is there any limitation on the programming part, besides what was written
in the exercise paper? (run time, space consumption, code reusability, etc.) |
No.
|
I got the feeling that it is recommended to do this as part of Weka (extend weka.classifiers.Classifier) right? |
It will save you loads of time: You can (re-)use Weka's source-code (e.g., kNN, ID3), and its visual "Experimenter".
|
The 'Breast Canceer Wisconsin' contain two datasets (one prognostic [wpbc] one diagnostic [wdbc]). Which one to use? |
'breast-cancer-wisconsin.data' - use the original dataset (20kb).
|
The 'Heart Disease Cleveland' has a reported problem with the original [75 attributes] Cleveland data (see file WARNING in the dataset directory): should we use the reduced 13 attribute file processed.cleveland.data?
What about the other databases in this dataset's directory. |
Use 'processed.cleveland.data'. Ignore other datasets.
|
The 'Heart Disease' database contains a 'costs' directory: should we use
the data from there? |
Nope.
|
|