The course will cover algorithmic techniques for handling large amounts of data. A rough breakdown of the weekly plan is as follows:
2 weeks - Sketching algorithms. What happens when you have to compute on a large stream of data that you can only see once, and store a small portion of?
3 weeks - Dimensionality reduction techniques: How do you represent a high dimensional dataset using few dimensions, without sacrificing too much information?
4 weeks - Large matrices - SVD (singular value decomposition) and methods for approximating it for large matrices in streaming and online setting. Large scale linear algebra (matrix multiplication, linear regression)? How can we complete matrices using few observations of its coordinates (the Netflix problem)?
2 weeks - Lower bounds and complexity issues (for streaming and dimensionality reduction)
Another 2 weeks will be dedicated to recent papers and state-of-the-art results.
Grade composition:
- 2 or 3 problem sets, to be submitted individually. (33.3%)
- One mini-project (more information during first third of class) (33.3%)
- One summary of class written in academic level (33.3%)