CS6245: Parallelizing Compilers
CRN-34760
All computers today are parallel computers. Are you curious about how to generate or write code for parallel processors (vector, multicore, GPU, clusters) in modern computing systems, and how to better use memory hierarchies (scratchpads, aches, TLBs)? What program properties enable or prevent an application from exploiting the parallelism and locality on modern hardware? These questions have taken on new importance as parallelism is now ubiquitous in hardware with the end of Dennard Scaling and Moore’s Law, and has also become critical for newer application domains including machine learning and data analytics. To address these questions, this course will cover the foundations of compilation techniques for parallel computer systems, including the analysis of data and control dependences and program transformations to enhance fine-grained parallelism, register/cache locality, and coarse-grained parallelism. By the end of this course, students will be knowledgeable about the strengths and limitations of state-of-the-art compilers, both from the viewpoint of the application developers as well as of the compiler developer. The techniques taught in the course should be relevant to anyone interested in enabling software to execute efficiently on current and future parallel processors, whether the parallelism is created manually or through the use of compilers.
Learning Objectives
Upon successful completion of this course, you should be able to:
- Optimize the performance of compiler-generated code and HPC programs.
- Understand the capabilities and limitations of loop transformations for locality and parallelism (distribution, fusion, interchange, skewing, tiling, polyhedral)
- Design compiler extensions to perform automatic parallelization, and related high level code optimizations
- Through your project, learn about the state of the art in current research on a topic of your choice related to optimizing compilers. Your project report should include: (1) an abstract problem statement that can be stated formally, (2) an overview of your solution approach, (3) demonstration of the practicality of your approach through a prototype implementation in a compiler framework or through performance studies of hand-coded examples, and (4) a comparison with related work.
Policies
See the Course Policies for more information on grading, assignments, and other course policies.
Course Information
- Course Code: CS6245
- CRN: 34760
- Credits: 3
- Instructor: Willow Ahrens (ahrens@gatech.edu)
- Office Hours: 12:00 PM - 1:00 PM on Tuesdays in Klaus Advanced Computing Building, Room 3144
- TA: Ruchika R Shirsath (rshirsath3@gatech.edu) and Joel Mathew Cherian (jcherian32@gatech.edu)
- TA Office Hours: 11:00 AM - 12:00 PM on Thursdays in Klaus Advanced Computing Building, Room 2108
- Class Room: College of Computing Room 101
- Class Time: Monday & Wednesday, 5:00–6:15 PM
Course Materials
Textbook: Randy Allen and Ken Kennedy, Optimizing Compilers for Modern Architectures, Morgan-Kaufmann, Second Printing, 2005.
Optional Textbook: Alfred V. Aho, Jeffrey D. Ullman, Ravi Sethi, Monica S. Lam, Compilers: Principles, Techniques, and Tools, 2nd edition, 2011.
Additional Readings: Additional readings will be recommended throughout the course. You will need to authenticate to the Georgia Tech Library Proxy to access the official versions of these readings. For convenience, try adding the papers you read to a citation manager, such as Zotero or Mendeley.
Assignments
- There will be 4 homework assignments, a midterm exam, a final exam, and a final project.
- Additionally, there will be in-class worksheets to help you practice concepts.
- Please see the Course Policies for more information on grading.
Schedule
| Date | Topic | Assignments | Reading |
|---|---|---|---|
| Mon, Jan 12 | Class: Course Overview, Motivation, Preview | A New Golden Age for Computer Architecture How to Read a Paper | |
| Wed, Jan 14 | Class: Timing and Measurement | Homework 1 Assigned | Sigplan Guidelines for Empirical Measurement Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results Producing wrong data without doing anything obviously wrong! |
| Mon, Jan 19 | NO CLASS - SCHOOL HOLIDAY | ||
| Wed, Jan 21 | Performance Modeling and Optimization | Chapter 1 of Allen and Kennedy Roofline: an insightful visual performance model for multicore architectures | |
| Mon, Jan 26 | NO CLASS - SNOW DAY | ||
| Wed, Jan 28 | Data Dependence | Chapter 2.1-2.3 of Allen and Kennedy | |
| Mon, Feb 2 | Dependence and Parallelism | Chapter 2.4 of Allen and Kennedy | |
| Wed, Feb 4 | Dependence Testing | Chapter 3 of Allen and Kennedy | |
| Fri, Feb 6 | Homework 1 Due | ||
| Mon, Feb 9 | Term Rewriting | Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies Software Design for Flexibility, Chapter 4 | |
| Wed, Feb 11 | Syntax and Semantics | “Types and Programming Languages” by Benjamin A. Pierce Heartbeat Scheduling: Provable Efficiency for Nested Parallelism | |
| Mon, Feb 16 | Dataflow and Preliminary Transformations | Homework 2 Assigned | Chapter 4 of Allen and Kennedy Compilers: Principles, Techniques, and Tools, Second Edition, Chapter 9 Lecture 4, Dataflow Analysis, Cornell CS 6120 |
| Wed, Feb 18 | Domain-Specific Languages | Programming Pearls: Little Languages Halide: decoupling algorithms from schedules for high-performance image processing DSL4HPC | |
| Fri, Feb 20 | Final Project Proposals Due | ||
| Mon, Feb 23 | Enhancing Fine-Grained Parallelism | Chapter 5 of Allen and Kennedy | |
| Wed, Feb 25 | Enhancing Course-Grained Parallelism | Chapter 6 of Allen and Kennedy | |
| Fri, Feb 27 | Homework 2 Due | ||
| Mon, Mar 2 | Unimodular Transformations | Presburger Formulas and Polyhedral Compilation Compilers: Principles, Techniques, and Tools, Chapter 10.4, 11 | |
| Wed, Mar 4 | Midterm | Homework 3 Assigned | |
| Mon, Mar 9 | Compiler Management of Registers | Chapter 8 of Allen and Kennedy | |
| Wed, Mar 11 | Compiler Management of Cache | Chapter 9 of Allen and Kennedy | |
| Mon, Mar 16 | Dependence and Control Flow | Chapter 7 of Allen and Kennedy All you need is superword-level parallelism: systematic control-flow vectorization with SLP | |
| Wed, Mar 18 | Polyhedral Compilation | Presburger Formulas and Polyhedral Compilation Compilers: Principles, Techniques, and Tools, Chapter 10.4, 11 | |
| Fri, Mar 20 | Homework 3 Due | ||
| Mon, Mar 23 | NO CLASS - SPRING BREAK | ||
| Wed, Mar 25 | |||
| Mon, Mar 30 | Loop Fusion | Homework 4 assigned | Chapter 6 of Allen and Kennedy |
| Wed, Apr 1 | Scalarization | Chapter 13 of Allen and Kennedy | |
| Mon, Apr 6 | Autoscheduling | OpenTuner: An Extensible Framework for Program Autotuning Autotuning in High-Performance Computing Applications | |
| Wed, Apr 8 | Compilers for Irregular Data | The tensor algebra compiler Looplets: A Language for Structured Coiteration | |
| Mon, Apr 13 | Compilers for Accelerators | Task-Based Tensor Computations on Modern GPUs | |
| Wed, Apr 15 | Compilers for Distributed Memory | Legion: Expressing Locality and Independence with Logical Regions DISTAL: The Distributed Tensor Algebra Compiler | |
| Fri, Apr 17 | Homework 4 due | ||
| Mon, Apr 20 | Final project presentations/Guest Lecture | ||
| Wed, Apr 22 | Final project presentations/Guest Lecture | ||
| Mon, Apr 27 | Final project presentations/Guest Lecture | ||
| Fri, May 1 | Final Exam (6:00pm - 8:50pm) |
Inspired by
- GA Tech CS 8803-DSL: Domain Specific Languages for High Performance Computing, Willow Ahrens.
- Stanford CS343D: Domain-Specific Programming Models and Compilers, Fredrik Kjolstad.
- MIT 6.172: Performance Engineering of Software Systems,
- Cornell CS 6120: Advanced Compilers
- UC Berkeley CS267: Applications of Parallel Computers, Aydin Buluc, Kathy Yelick, James Demmel.
- UC Berkeley CS294: Building User-Centered Programming Tools
- 6.S894: Accelerated Computing