CS6245: Parallelizing Compilers

CRN-34760

All computers today are parallel computers. Are you curious about how to generate or write code for parallel processors (vector, multicore, GPU, clusters) in modern computing systems, and how to better use memory hierarchies (scratchpads, aches, TLBs)? What program properties enable or prevent an application from exploiting the parallelism and locality on modern hardware? These questions have taken on new importance as parallelism is now ubiquitous in hardware with the end of Dennard Scaling and Moore’s Law, and has also become critical for newer application domains including machine learning and data analytics. To address these questions, this course will cover the foundations of compilation techniques for parallel computer systems, including the analysis of data and control dependences and program transformations to enhance fine-grained parallelism, register/cache locality, and coarse-grained parallelism. By the end of this course, students will be knowledgeable about the strengths and limitations of state-of-the-art compilers, both from the viewpoint of the application developers as well as of the compiler developer. The techniques taught in the course should be relevant to anyone interested in enabling software to execute efficiently on current and future parallel processors, whether the parallelism is created manually or through the use of compilers.

Learning Objectives

Upon successful completion of this course, you should be able to:

Optimize the performance of compiler-generated code and HPC programs.
Understand the capabilities and limitations of loop transformations for locality and parallelism (distribution, fusion, interchange, skewing, tiling, polyhedral)
Design compiler extensions to perform automatic parallelization, and related high level code optimizations
Through your project, learn about the state of the art in current research on a topic of your choice related to optimizing compilers. Your project report should include: (1) an abstract problem statement that can be stated formally, (2) an overview of your solution approach, (3) demonstration of the practicality of your approach through a prototype implementation in a compiler framework or through performance studies of hand-coded examples, and (4) a comparison with related work.

Policies

See the Course Policies for more information on grading, assignments, and other course policies.

Course Information

Course Code: CS6245
CRN: 34760
Credits: 3
Instructor: Willow Ahrens (ahrens@gatech.edu)
Office Hours: 12:00 PM - 1:00 PM on Tuesdays in Klaus Advanced Computing Building, Room 3144
TA: Ruchika R Shirsath (rshirsath3@gatech.edu) and Joel Mathew Cherian (jcherian32@gatech.edu)
TA Office Hours: 11:00 AM - 12:00 PM on Thursdays in Klaus Advanced Computing Building, Room 2108
Class Room: College of Computing Room 101
Class Time: Monday & Wednesday, 5:00–6:15 PM

Course Materials

Textbook: Randy Allen and Ken Kennedy, Optimizing Compilers for Modern Architectures, Morgan-Kaufmann, Second Printing, 2005.
Optional Textbook: Alfred V. Aho, Jeffrey D. Ullman, Ravi Sethi, Monica S. Lam, Compilers: Principles, Techniques, and Tools, 2nd edition, 2011.
Additional Readings: Additional readings will be recommended throughout the course. You will need to authenticate to the Georgia Tech Library Proxy to access the official versions of these readings. For convenience, try adding the papers you read to a citation manager, such as Zotero or Mendeley.

Assignments

There will be 4 homework assignments, a midterm exam, a final exam, and a final project.
Additionally, there will be in-class worksheets to help you practice concepts.
Please see the Course Policies for more information on grading.

Schedule

Date	Topic	Assignments	Reading
Mon, Jan 12	Class: Course Overview, Motivation, Preview		A New Golden Age for Computer Architecture How to Read a Paper
Wed, Jan 14	Class: Timing and Measurement	Homework 1 Assigned	Sigplan Guidelines for Empirical Measurement Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results Producing wrong data without doing anything obviously wrong!
Mon, Jan 19	NO CLASS - SCHOOL HOLIDAY
Wed, Jan 21	Performance Modeling and Optimization		Chapter 1 of Allen and Kennedy Roofline: an insightful visual performance model for multicore architectures
Mon, Jan 26	NO CLASS - SNOW DAY
Wed, Jan 28	Data Dependence		Chapter 2.1-2.3 of Allen and Kennedy
Mon, Feb 2	Dependence and Parallelism		Chapter 2.4 of Allen and Kennedy
Wed, Feb 4	Dependence Testing		Chapter 3 of Allen and Kennedy
Fri, Feb 6		Homework 1 Due
Mon, Feb 9	Term Rewriting		Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies Software Design for Flexibility, Chapter 4
Wed, Feb 11	Syntax and Semantics		“Types and Programming Languages” by Benjamin A. Pierce Heartbeat Scheduling: Provable Efficiency for Nested Parallelism
Mon, Feb 16	Dataflow and Preliminary Transformations	Homework 2 Assigned	Chapter 4 of Allen and Kennedy Compilers: Principles, Techniques, and Tools, Second Edition, Chapter 9 Lecture 4, Dataflow Analysis, Cornell CS 6120
Wed, Feb 18	Domain-Specific Languages		Programming Pearls: Little Languages Halide: decoupling algorithms from schedules for high-performance image processing DSL4HPC
Fri, Feb 20		Final Project Proposals Due
Mon, Feb 23	Enhancing Fine-Grained Parallelism		Chapter 5 of Allen and Kennedy
Wed, Feb 25	Enhancing Course-Grained Parallelism		Chapter 6 of Allen and Kennedy
Fri, Feb 27		Homework 2 Due
Mon, Mar 2	Unimodular Transformations		Presburger Formulas and Polyhedral Compilation Compilers: Principles, Techniques, and Tools, Chapter 10.4, 11
Wed, Mar 4	Midterm	Homework 3 Assigned
Mon, Mar 9	Compiler Management of Registers		Chapter 8 of Allen and Kennedy
Wed, Mar 11	Compiler Management of Cache		Chapter 9 of Allen and Kennedy
Mon, Mar 16	Dependence and Control Flow		Chapter 7 of Allen and Kennedy All you need is superword-level parallelism: systematic control-flow vectorization with SLP
Wed, Mar 18	Polyhedral Compilation		Presburger Formulas and Polyhedral Compilation Compilers: Principles, Techniques, and Tools, Chapter 10.4, 11
Fri, Mar 20		Homework 3 Due
Mon, Mar 23	NO CLASS - SPRING BREAK
Wed, Mar 25
Mon, Mar 30	Loop Fusion	Homework 4 assigned	Chapter 6 of Allen and Kennedy
Wed, Apr 1	Scalarization		Chapter 13 of Allen and Kennedy
Mon, Apr 6	Autoscheduling		OpenTuner: An Extensible Framework for Program Autotuning Autotuning in High-Performance Computing Applications
Wed, Apr 8	Compilers for Irregular Data		The tensor algebra compiler Looplets: A Language for Structured Coiteration
Mon, Apr 13	Compilers for Accelerators		Task-Based Tensor Computations on Modern GPUs
Wed, Apr 15	Compilers for Distributed Memory		Legion: Expressing Locality and Independence with Logical Regions DISTAL: The Distributed Tensor Algebra Compiler
Fri, Apr 17		Homework 4 due
Mon, Apr 20	Final project presentations/Guest Lecture
Wed, Apr 22	Final project presentations/Guest Lecture
Mon, Apr 27	Final project presentations/Guest Lecture
Fri, May 1	Final Exam (6:00pm - 8:50pm)

Inspired by

GA Tech CS 8803-DSL: Domain Specific Languages for High Performance Computing, Willow Ahrens.
Stanford CS343D: Domain-Specific Programming Models and Compilers, Fredrik Kjolstad.
MIT 6.172: Performance Engineering of Software Systems,
Cornell CS 6120: Advanced Compilers
UC Berkeley CS267: Applications of Parallel Computers, Aydin Buluc, Kathy Yelick, James Demmel.
UC Berkeley CS294: Building User-Centered Programming Tools
6.S894: Accelerated Computing