הטכניון - מכון טכנולוגי לישראל Technion - Israel Institute of Technology Технион - израильский технологический институт ألتخنيون - معهد تكنولوجي لإسرائيل

02360756 - מבוא למערכות לומדות 02360756 - Introduction to Machine Learning 02360756 - Introduction to Machine Learning 02360756 - Introduction to Machine Learning

אביב 2005-2006Spring 2005-2006Весна 2005-2006ربيع 2005-2006

שאלות ותשובות Frequently Asked Questions Вопросы и Ответы أسئلة وأجوبة

		Assignment 4

Assignment 3

Is there any limitation on the programming part, besides what was written in the exercise paper? (run time, space consumption, code reusability, etc.)
No.

I got the feeling that it is recommended to do this as part of Weka (extend weka.classifiers.Classifier) right?
It will save you loads of time: You can (re-)use Weka's source-code (e.g., kNN, ID3), and its visual "Experimenter".

The 'Breast Canceer Wisconsin' contain two datasets (one prognostic [wpbc] one diagnostic [wdbc]). Which one to use?
'breast-cancer-wisconsin.data' - use the original dataset (20kb).

The 'Heart Disease Cleveland' has a reported problem with the original [75 attributes] Cleveland data (see file WARNING in the dataset directory): should we use the reduced 13 attribute file processed.cleveland.data? What about the other databases in this dataset's directory.
Use 'processed.cleveland.data'. Ignore other datasets.

The 'Heart Disease' database contains a 'costs' directory: should we use the data from there?
Nope.

שאלות ותשובות Frequently Asked Questions Вопросы и Ответы أسئلة وأجوبة

Assignment 3

Is there any limitation on the programming part, besides what was written in the exercise paper? (run time, space consumption, code reusability, etc.)

I got the feeling that it is recommended to do this as part of Weka (extend weka.classifiers.Classifier) right?

The 'Breast Canceer Wisconsin' contain two datasets (one prognostic [wpbc] one diagnostic [wdbc]). Which one to use?

The 'Heart Disease Cleveland' has a reported problem with the original [75 attributes] Cleveland data (see file WARNING in the dataset directory): should we use the reduced 13 attribute file processed.cleveland.data? What about the other databases in this dataset's directory.

The 'Heart Disease' database contains a 'costs' directory: should we use the data from there?

Is there any limitation on the programming part, besides what was written
in the exercise paper? (run time, space consumption, code reusability, etc.)

The 'Heart Disease Cleveland' has a reported problem with the original [75 attributes] Cleveland data (see file WARNING in the dataset directory): should we use the reduced 13 attribute file processed.cleveland.data?
What about the other databases in this dataset's directory.

The 'Heart Disease' database contains a 'costs' directory: should we use
the data from there?