CS6245: Parallelizing Compilers

CRN-34760

All computers today are parallel computers. Are you curious about how to generate or write code for parallel processors (vector, multicore, GPU, clusters) in modern computing systems, and how to better use memory hierarchies (scratchpads, aches, TLBs)? What program properties enable or prevent an application from exploiting the parallelism and locality on modern hardware? These questions have taken on new importance as parallelism is now ubiquitous in hardware with the end of Dennard Scaling and Moore’s Law, and has also become critical for newer application domains including machine learning and data analytics. To address these questions, this course will cover the foundations of compilation techniques for parallel computer systems, including the analysis of data and control dependences and program transformations to enhance fine-grained parallelism, register/cache locality, and coarse-grained parallelism. By the end of this course, students will be knowledgeable about the strengths and limitations of state-of-the-art compilers, both from the viewpoint of the application developers as well as of the compiler developer. The techniques taught in the course should be relevant to anyone interested in enabling software to execute efficiently on current and future parallel processors, whether the parallelism is created manually or through the use of compilers.

Learning Objectives

Upon successful completion of this course, you should be able to:

  • Optimize the performance of compiler-generated code and HPC programs.
  • Understand the capabilities and limitations of loop transformations for locality and parallelism (distribution, fusion, interchange, skewing, tiling, polyhedral)
  • Design compiler extensions to perform automatic parallelization, and related high level code optimizations
  • Through your project, learn about the state of the art in current research on a topic of your choice related to optimizing compilers. Your project report should include: (1) an abstract problem statement that can be stated formally, (2) an overview of your solution approach, (3) demonstration of the practicality of your approach through a prototype implementation in a compiler framework or through performance studies of hand-coded examples, and (4) a comparison with related work.

Policies

See the Course Policies for more information on grading, assignments, and other course policies.

Course Information

  • Course Code: CS6245
  • CRN: 34760
  • Credits: 3
  • Instructor: Willow Ahrens (ahrens@gatech.edu)
  • Office Hours: 12:00 PM - 1:00 PM on Tuesdays in Klaus Advanced Computing Building, Room 3144
  • TA: Ruchika R Shirsath (rshirsath3@gatech.edu) and Joel Mathew Cherian (jcherian32@gatech.edu)
  • TA Office Hours: 11:00 AM - 12:00 PM on Thursdays in Klaus Advanced Computing Building, Room 2108
  • Class Room: College of Computing Room 101
  • Class Time: Monday & Wednesday, 5:00–6:15 PM

Course Materials

  • Textbook: Randy Allen and Ken Kennedy, Optimizing Compilers for Modern Architectures, Morgan-Kaufmann, Second Printing, 2005.

  • Optional Textbook: Alfred V. Aho, Jeffrey D. Ullman, Ravi Sethi, Monica S. Lam, Compilers: Principles, Techniques, and Tools, 2nd edition, 2011.

  • Additional Readings: Additional readings will be recommended throughout the course. You will need to authenticate to the Georgia Tech Library Proxy to access the official versions of these readings. For convenience, try adding the papers you read to a citation manager, such as Zotero or Mendeley.

Assignments

  • There will be 4 homework assignments, a midterm exam, a final exam, and a final project.
  • Additionally, there will be in-class worksheets to help you practice concepts.
  • Please see the Course Policies for more information on grading.

Schedule

DateTopicAssignmentsReading
Mon, Jan 12Class: Course Overview, Motivation, Preview A New Golden Age for Computer Architecture
How to Read a Paper
Wed, Jan 14Class: Timing and MeasurementHomework 1 AssignedSigplan Guidelines for Empirical Measurement
Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results
Producing wrong data without doing anything obviously wrong!
Mon, Jan 19NO CLASS - SCHOOL HOLIDAY  
Wed, Jan 21Performance Modeling and Optimization Chapter 1 of Allen and Kennedy
Roofline: an insightful visual performance model for multicore architectures
Mon, Jan 26NO CLASS - SNOW DAY  
Wed, Jan 28Data Dependence Chapter 2.1-2.3 of Allen and Kennedy
Mon, Feb 2Dependence and Parallelism Chapter 2.4 of Allen and Kennedy
Wed, Feb 4Dependence Testing Chapter 3 of Allen and Kennedy
Fri, Feb 6 Homework 1 Due 
Mon, Feb 9Term Rewriting Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies
Software Design for Flexibility, Chapter 4
Wed, Feb 11Syntax and Semantics “Types and Programming Languages” by Benjamin A. Pierce
Heartbeat Scheduling: Provable Efficiency for Nested Parallelism
Mon, Feb 16Dataflow and Preliminary TransformationsHomework 2 AssignedChapter 4 of Allen and Kennedy
Compilers: Principles, Techniques, and Tools, Second Edition, Chapter 9
Lecture 4, Dataflow Analysis, Cornell CS 6120
Wed, Feb 18Domain-Specific Languages Programming Pearls: Little Languages
Halide: decoupling algorithms from schedules for high-performance image processing
DSL4HPC
Fri, Feb 20 Final Project Proposals Due 
Mon, Feb 23Enhancing Fine-Grained Parallelism Chapter 5 of Allen and Kennedy
Wed, Feb 25Enhancing Course-Grained Parallelism Chapter 6 of Allen and Kennedy
Fri, Feb 27 Homework 2 Due 
Mon, Mar 2Unimodular Transformations Presburger Formulas and Polyhedral Compilation
Compilers: Principles, Techniques, and Tools, Chapter 10.4, 11
Wed, Mar 4MidtermHomework 3 Assigned 
Mon, Mar 9Compiler Management of Registers Chapter 8 of Allen and Kennedy
Wed, Mar 11Compiler Management of Cache Chapter 9 of Allen and Kennedy
Mon, Mar 16Dependence and Control Flow Chapter 7 of Allen and Kennedy
All you need is superword-level parallelism: systematic control-flow vectorization with SLP
Wed, Mar 18Polyhedral Compilation Presburger Formulas and Polyhedral Compilation
Compilers: Principles, Techniques, and Tools, Chapter 10.4, 11
Fri, Mar 20 Homework 3 Due 
Mon, Mar 23NO CLASS - SPRING BREAK  
Wed, Mar 25   
Mon, Mar 30Loop FusionHomework 4 assignedChapter 6 of Allen and Kennedy
Wed, Apr 1Scalarization Chapter 13 of Allen and Kennedy
Mon, Apr 6Autoscheduling OpenTuner: An Extensible Framework for Program Autotuning
Autotuning in High-Performance Computing Applications
Wed, Apr 8Compilers for Irregular Data The tensor algebra compiler
Looplets: A Language for Structured Coiteration
Mon, Apr 13Compilers for Accelerators Task-Based Tensor Computations on Modern GPUs
Wed, Apr 15Compilers for Distributed Memory Legion: Expressing Locality and Independence with Logical Regions
DISTAL: The Distributed Tensor Algebra Compiler
Fri, Apr 17 Homework 4 due 
Mon, Apr 20Final project presentations/Guest Lecture  
Wed, Apr 22Final project presentations/Guest Lecture  
Mon, Apr 27Final project presentations/Guest Lecture  
Fri, May 1Final Exam (6:00pm - 8:50pm)  

Inspired by