Complete list of papers is below. Some of them will not be covered if not enough students registered
See Google calendar link for the list of scheduled lectures - please choose the one to present from this list. We may have a complementary lecture (instead of the missing one), again, if there are more students than the lectures.
Architecture
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, PPoPP 08,Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi , Sam S. Stone, David B. Kirk Wen-mei W. Hwu
2. Subsystems
Memory
Mark Silberstein, Assaf Schuster, Dan Geiger, Anjul Patney, John D. Owens: Efficient computation of sum-products on GPUs through software-managed cache. ICS 2008: 309-318
Thread scheduling, synchronizationhttp://eprints.cs.vt.edu/archive/00001087/01/TR_GPU_synchronization.pdf
http://synergy.cs.vt.edu/pubs/papers/xiao-icpads2009-gpu.pdf
CPU-GPU cooperation
Jeff A. Stuart, John D. Owens, Message passing on data-parallel architectures. IPDPS09
I. Gelado. J.E. Stone. J. Cabezas, S. Patel, N. Navarro and W.W. Hwu. "An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems" To appear in 15th International Conference on Architectural Support for Programming Languagges and Operating Systems (ASPLOS'10). March 2010. Pittsburgh, PA.
3. Basic algorithms, data structures
Bleloch book: chapter 3 (vector model), chapter 5 ( vector data structures), and chapter 7 (graph algs) :
http://www.cs.cmu.edu/~blelloch/papers/Ble90.pdf
4. Parallel Sort
Designing efficient sorting algorithms for manycore GPUs ( Satish, Harris, Garland ) IPDPS2009
A Practical Quicksort Algorithm for Graphics Processors ( Graph search, Daniel Cederman and Philippas Tsigas ) ESA08
5. graph algorithms
GPU Accelerated Pathfinding, Avi Bleiweiss - graphics hardware 08
All-Pairs Shortest-Paths for Large Graphs on the GPU. Gary J. Katz, Joseph T. Kider Jr., graphics hardware 08
6. parallel hashing and list ranking
Dan A. Alcantara, Andrei Sharf, Fatemeh Abbasinejad, Shubhabrata Sengupta, Michael Mitzenmacher, John D. Owens, Nina Amenta: Real-time parallel hashing on the GPU. ACM Trans. Graph. 28(5): (2009)
M. Suhail Rehman, K. Kothapalli, P.J. Narayanan. Fast and Scalable List Ranking on the GPU. 23rd International Conference onSupercomputing (ICS09)
7. scan
Yuri Dotsenko, Naga K. Govindaraju, Peter-Pike J. Sloan, Charles Boyd, John Manferdelli: Fast scan algorithms on graphics processors. 205-21
Shubhabrata Sengupta, Mark Harris, Yao Zhang, John D. Owens: Scan primitives for GPU computing. Graphics Hardware 2007: 97-106
(S. Sengupta, M. Harris, and M. Garland. Efficient parallel scan algorithms for GPUs. NVIDIA Technical Report NVR-2008-003, December 2008
Also here: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html
Applications:
Data mining
A translation system for Enabling Data Mining Applications on GPUs: ICS09
Accelerating DBs Data Parallel Bin-Based Indexing for Answering Queries on Multi-Core ArchitecturesLuke J. Gosink , Kesheng Wu, E. Wes Bethel , John D. Owens , and Kenneth I. Joy
Databases
Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga K. Govindaraju, Qiong Luo, and Pedro V. Sander. Relational Joins on Graphics Processors. ACM SIGMOD 2008.
Bingsheng He, Naga K. Govindaraju, Qiong Luo, and Burton Smith. Efficient Gather and Scatter Operations on Graphics Processors. ACM/IEEE SuperComputing(SC), Nov 2007.
with some more form here: http://www.nvidia.com/object/data_mining_analytics_database.html
Map reduce
Bingsheng He, Wenbin Fang, Qiong Luo, Naga K. Govindaraju, and Tuyong Wang. Mars: A MapReduce Framework on Graphics Processors. PACT 2008.
Bryan Catanzaro, Narayanan Sundaram and Kurt Keutzer, Berkeley,A Map Reduce Framework for Programming Graphics Processors
See Google calendar link for the list of scheduled lectures - please choose the one to present from this list. We may have a complementary lecture (instead of the missing one), again, if there are more students than the lectures.
Architecture
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, PPoPP 08,Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi , Sam S. Stone, David B. Kirk Wen-mei W. Hwu
2. Subsystems
Memory
Mark Silberstein, Assaf Schuster, Dan Geiger, Anjul Patney, John D. Owens: Efficient computation of sum-products on GPUs through software-managed cache. ICS 2008: 309-318
Thread scheduling, synchronizationhttp://eprints.cs.vt.edu/archive/00001087/01/TR_GPU_synchronization.pdf
http://synergy.cs.vt.edu/pubs/papers/xiao-icpads2009-gpu.pdf
CPU-GPU cooperation
Jeff A. Stuart, John D. Owens, Message passing on data-parallel architectures. IPDPS09
I. Gelado. J.E. Stone. J. Cabezas, S. Patel, N. Navarro and W.W. Hwu. "An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems" To appear in 15th International Conference on Architectural Support for Programming Languagges and Operating Systems (ASPLOS'10). March 2010. Pittsburgh, PA.
3. Basic algorithms, data structures
Bleloch book: chapter 3 (vector model), chapter 5 ( vector data structures), and chapter 7 (graph algs) :
http://www.cs.cmu.edu/~blelloch/papers/Ble90.pdf
4. Parallel Sort
Designing efficient sorting algorithms for manycore GPUs ( Satish, Harris, Garland ) IPDPS2009
A Practical Quicksort Algorithm for Graphics Processors ( Graph search, Daniel Cederman and Philippas Tsigas ) ESA08
5. graph algorithms
GPU Accelerated Pathfinding, Avi Bleiweiss - graphics hardware 08
All-Pairs Shortest-Paths for Large Graphs on the GPU. Gary J. Katz, Joseph T. Kider Jr., graphics hardware 08
6. parallel hashing and list ranking
Dan A. Alcantara, Andrei Sharf, Fatemeh Abbasinejad, Shubhabrata Sengupta, Michael Mitzenmacher, John D. Owens, Nina Amenta: Real-time parallel hashing on the GPU. ACM Trans. Graph. 28(5): (2009)
M. Suhail Rehman, K. Kothapalli, P.J. Narayanan. Fast and Scalable List Ranking on the GPU. 23rd International Conference onSupercomputing (ICS09)
7. scan
Yuri Dotsenko, Naga K. Govindaraju, Peter-Pike J. Sloan, Charles Boyd, John Manferdelli: Fast scan algorithms on graphics processors. 205-21
Shubhabrata Sengupta, Mark Harris, Yao Zhang, John D. Owens: Scan primitives for GPU computing. Graphics Hardware 2007: 97-106
(S. Sengupta, M. Harris, and M. Garland. Efficient parallel scan algorithms for GPUs. NVIDIA Technical Report NVR-2008-003, December 2008
Also here: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html
Applications:
Data mining
A translation system for Enabling Data Mining Applications on GPUs: ICS09
Accelerating DBs Data Parallel Bin-Based Indexing for Answering Queries on Multi-Core ArchitecturesLuke J. Gosink , Kesheng Wu, E. Wes Bethel , John D. Owens , and Kenneth I. Joy
Databases
Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga K. Govindaraju, Qiong Luo, and Pedro V. Sander. Relational Joins on Graphics Processors. ACM SIGMOD 2008.
Bingsheng He, Naga K. Govindaraju, Qiong Luo, and Burton Smith. Efficient Gather and Scatter Operations on Graphics Processors. ACM/IEEE SuperComputing(SC), Nov 2007.
with some more form here: http://www.nvidia.com/object/data_mining_analytics_database.html
Map reduce
Bingsheng He, Wenbin Fang, Qiong Luo, Naga K. Govindaraju, and Tuyong Wang. Mars: A MapReduce Framework on Graphics Processors. PACT 2008.
Bryan Catanzaro, Narayanan Sundaram and Kurt Keutzer, Berkeley,A Map Reduce Framework for Programming Graphics Processors