“Our warehouse stores upwards of 300 PB of Hive data, with an incoming daily rate of about 600 TB. In the last year, the warehouse has seen a 3x growth in the amount of data stored. Given this growth trajectory, storage efficiency is and will continue to be a focus for our warehouse infrastructure.”
(Facebook, April 2014)
Erasure coding is an important research area in information theory. Current objectives in constructing new codes and techniques are to minimize their storage overhead and reconstruction time without interfering with the system’s normal operation. The goal of this seminar is to understand the benefits, limitations, and tradeoffs of erasure codes in the context of real world, large scale, distributed storage systems. We will survey promising new coding techniques, as well as experience in employing them in large scale data centers and distributed storage systems.