The course will cover algorithmic techniques for handling large amounts of data. A rough breakdown of the weekly plan is as follows:
2 weeks - Sketching algorithms. What happens when you have to compute on a large stream of data that you can only see once, and store a small portion of?
3 weeks - Dimensionality reduction techniques: How do you represent a high dimensional dataset using few dimensions, without sacrificing too much information?
4 weeks - Large matrices - SVD (singular value decomposition) and methods for approximating it for large matrices in streaming and online setting. Large scale linear algebra (matrix multiplication, linear regression)? How can we complete matrices using few observations of its coordinates (the Netflix problem)?
2 weeks - Lower bounds and complexity issues (for streaming and dimensionality reduction)
Another 2 weeks will be dedicated to recent papers and state-of-the-art results.
Grade composition:
50% Homework problem sets (3 or 4)
Must be typed in using latex, or similar quality.
40% final project (details and options to choose from TBD)
10% Attendance