Introduction
The course focuses on the topic of machine learning over structured data (relational databases), and involves theoretical and practical exercise. This is a 3.0 credit course.
The course teaches machine learning (ML) techniques applied to relational databases, which are collections of tables with interconnected records. To this end, the course focuses on learning from tables and graphs, in addition to relational databases, and on the relationships among ML methods across these three data representations. For each data type, the course covers a range of approaches, from traditional to state-of-the-art, including feature engineering and automatic feature generation, low-dimensional embeddings, deep learning methods (e.g., Graph Neural Networks), and transformer-based foundation models. These techniques span different levels of generalization, ranging from dataset-specific and schema-specific learning to cross-schema learning. The course includes programming assignments and a final project. Following each homework assignment, you will be required to answer a personal written questionnaire in class, where you will be asked to describe your solution and show an understanding of it. On questionnaire days (which will be announced in advance), attendance is mandatory.
All lectures will be given in English.
Prerequisites
Intro to ML 236756 or a similar one.
Homework
There will be 3-4 homework assignments published during the semester (in intervals of 2-3 weeks), consisting of theoretical and/or programming questions. Programming must be done in Python (.py files / Jupyter notebook). Submission is in pairs only.
Grade
The course grade will be based on the homework exercises and a final project.
